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A.  DIRECTOR'S  OVERVIEW 

This  document  is  the  third  year  annual  report  of  the  Cornell  Joint  Services 
Electronics  Program  for  the  period  from  May  I,  1992  to  April  30,  1993.  The 
present  Cornell  program  carries  two  themes:  femtosecond  carrier  processes  in 
compound  semiconductors,  and  real  time  signal  processing.  The  program 
has  advanced  according  to  plan.  Seven  task  investigators,  Profs.  R.  Shealy,  C. 
Tang,  C.  Pollock,  P.  Krusius,  A.  Bojanczyk,  F.  Luk,  and  H.  Torng,  with  their 
graduate  students  have  contributed  to  JSEP  research  this  year.  A  substitute 
task  for  G.  Bilardi's  effort  with  Prof.  Adam  Bojanczyk  was  started  in 
September  30,  1991.  F.  Luk  has  been  for  most  of  the  current  period  on  a  leave 
of  absence  from  Cornell  University,  He  resigned  from  Cornell  during  the  fall 
of  1992  in  order  to  take  a  position  as  the  chairman  of  the  Computer  Science 
Department  at  Rensselaer  Polytechnic  Institute,  but  has  continued  to 
supervise  his  JSEP  graduate  students  to  the  end  of  the  current  program 
period.  Ten  graduate  students  have  been  partially,  or  fully,  supported  by  JSEP 
this  year.  A  total  of  29  publications  and  three  theses  were  prepared  in  this 
period  or  are  now  in  various  stages  of  processing.  Two  PhD  degrees  and  one 
M.S.  degiee  have  been  awarded  to  JSEP  supported  students  during  this 
reporting  period. 

A  proposal  for  the  continuation  of  JSEP  research  for  the  next  three  year 
period,  starting  May  1,  1993,  was  submitted  on  August  1,  1992.  In  this 
proposal  all  research  was  focused  into  the  fundamentals  of  high  speed 
photonic  devices  including  issues  from  materials  to  devices.  An  on  site 
review  of  the  proposal  was  held  at  Cornell  on  October  29  and  30, 1992. 

B.  DESCRIPTION  OF  SPECIAL  ACCOMPLISHMENTS  AND  TECHNOLOGY 
TRANSITION 

B.l.  Femtosecond  Carrier  Processes  in  Compound  Semiconductors 

Several  significant  achievements  have  been  reached  in  the  research 
performed  under  the  compound  semiconductor  theme.  The  new  off-campus 
organometallic  vapor  phase  epitaxial  (OMVPE)  compound  semiconductor 
materials  growth  facility  is  in  its  second  year  of  operation  under  the 
leadership  of  Prof.  R.  Shealy.  The  first  reactor  has  been  in  routine  operation, 
while  two  other  reactors  are  being  readied  for  materials  growth.  In  the  first 
JSEP  task  high  purity  GaAs  films  have  been  grown  by  flow  modulation  at  low 
V/in  ratios,  which  greatly  reduce  the  environmental  impact  and  life  safety 
risk  of  the  hydride  based  OMVPE  growth  processes.  Good  morphology  was 
obtained  for  V/III  ratios  as  low  as  unity  above  610  C  growth  temperatures.  77K 
mobility  values  for  GaAs  reached  110,000  cm^/Vs  for  a  V/III  ratio  of  5.  Gas 
phase  reactions  between  TMAA,  TEG,  and  AsHa  have  been  examined  in 
detail.  This  study  provides  the  much  needed  understanding  of  AlGaAs 
growth  chemistry  with  these  gases.  The  maximum  exciton  linewidth  of  2.2 


meV  measured  from  AlGaAs  films  grown  with  this  understanding  compares 
well  with  the  best  ever  reported  result  for  that  composition.  Finally, 
AlGalnAs  layers  lattice  matched  to  InP  substrates,  and  strain  layer 
superlattices,  have  been  grown  with  the  flow  modulation  technique  in 
Cornell’s  unique  multi-chamber  OMVPE  reactor.  This  will  facilitate  the 
opportunity  to  study  new  ordered  quaternary  structures  with  improved 
electronic  properties  in  this  materials  system. 

The  research  into  new  tunable  femtosecond  sources  has  continued  in  the 
second  task  under  Prof.  C.  Tang.  This  group  has  demonstrated  the  generation 
of  high  repetition  rate  femtosecond  pulses  in  the  blue  for  the  first  time  by 

intracavity  doubling  of  a  mode  locked  Ti:sapphire  laser  using  {i-BaB204.  Over 
800  mW  of  430  nm  light  was  produced  with  intracavity  pumping  at  a 
repetition  rate  of  72  MHz  and  pulse  width  of  54  fs.  A  first  broadly  tuna':'!e 
femtosecond  optical  parametric  oscillator,  working  from  the  deep  red  to  the 
mid  ir,  was  demonstrated.  Tunable  femtosecond  pulses  from  720  nm  to  3.5 
jim  at  a  repetition  rate  of  80  MHz  and  power  level  of  a  few  mW  have  been 
generated.  Via  external  cavity  pumping  with  a  Ti:sapphire  laser,  this  group 
produced  record  output  powers  up  to  3.8  W  and  tunable  femtosecond  pulses 
in  the  multi-hundred  mW  range  at  a  total  conversion  efficiency  of  55%. 
These  new  femtosecond  sources  will  have  a  broad  impact  and  provide  a 
wealth  of  new  probes  to  the  scientific  community. 

C.  Pollock's  research  group  has  refined  their  tunable  femtosecond  optical 
characterization  system  for  equal  pulse  and  pump-and-probe  configurations. 
Quality  data  can  now  routinely  be  taken.  New  data  on  near  band  gap  carrier 
relaxation  in  undoped  InGaAs  thin  films  on  transparent  InP  substrates  has 
been  measured  as  a  function  of  photon  energy  and  light  intensity. 
Measurements  on  the  effect  of  background  charge,  introduced  via  doping  or 
synchronized  optical  pulses,  are  in  progress.  The  analysis  of  the  complex 
carrier  relaxation  behavior  has  been  done  in  collaboration  with  the  Monte 
Carlo  group  of  J.P.  Krusius.  This  group  has  for  the  first  time  successfully 
simulated  the  relaxation  of  optically  generated  near  band  gap  carriers 
(electrons  and  holes)  in  femtosecond  pump-and-probe  and  equal  pulse 
correlation  measurements.  Via  correlations  of  measurements  and 
simulations  these  two  groups  have  been  able  to  prove  that  band 
renormalization  and  dynamic  screening  are  the  two  most  important 
phenomena  beyond  the  usual  carrier  scattering  processes.  The  excellent 
agreement  obtained  between  simulated  and  measured  results  shows  that  the 
self-consistent  ensemble  Monte  Carlo  method  works  well  to  time  scales  as 
short  as  150  fs,  if  all  important  physical  processes  are  included.  The 
understanding  of  the  role  of  band  renormalization,  dynamic  screening,  and 
the  Coulomb  enhancement  of  the  optical  transition  elements  is  likely  to  have 
a  profound  effect  on  high  speed  optoelectronic  devices,  an  area  which  will  be 
examined  by  the  Monte  Carlo  group  under  the  new  proposed  JSEP  work. 
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Further  special  accomplishments  are  listed  in  the  description  of  research 
imder  each  of  the  tasks. 

B.2  Real  Time  Signal  Processing 

The  investigators  involved  in  the  real  time  signal  processing  theme.  Profs.  A. 
Bojanczyk,  F.  Luk,  and  H.  Torng,  have  continued  the  synergistic  work.  A. 
Bojanczyk's  research  was  focused  on  recursive  windowed  least  squares 
problems  arising  in  real-time  DSP  systems  derived  from  the  covariance 
differencing  principle.  New  highly  concurrent  algorithms  amenable  to 
efficient  parallel  implementations  were  developed.  H.  Torng's  research  group 
has  continued  work  on  interrupt  handling,  branch  processing,  and 
implementation  of  the  dispatch  stack  in  architectures  with  multiple 
functional  units.  These  advances  significantly  enhance  the  performance  of 
superscalar  processors  without  raising  the  clock  rate.  H.  Tomg  organized  the 
fourth  "Project  2000”  meeting  on  June  2  and  3  1992  at  Cornell  to  report  on 
computer  engineering  advances  in  the  past  year.  About  25  industrial 
representatives  attended  this  two  day  meeting.  F.  Luk  has  again  organized 
the  Advanced  Signal  Processing  Algorithms,  Architectures  and 
Implementations  Symposium  for  the  SPIE  and  edited  the  proceedings  (July 
1992). 

Further  special  accomplishments  are  listed  in  the  description  of  research 
under  each  of  the  tasks. 
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OMVPE  GROWTH  OF  III-V  ALLOYS  FOR  NEW 
HIGH  SPEED  ELECTRON  DEVICES 


Task  #1 

Task  Principal  Investigator  James  R.  Shealy 

(607)  255-4657 


QPTECnVE 

The  program  objective  for  the  JSEP  materials  task  is  to  explore  the  use  of 
advanced  OMVPE  processes  including  Flow  Modulation  Epitaxy  for  improved 
III-V  structures.  Studies  have  been  concluded  on  AlGaAs  structures  using  a 
new  A1  source  which  results  in  the  highest  purity  AlGaAs  with  ultra  low 
oxygen  and  carbon  contamination.  We  are  currently  emphasizing  new 
structures  lattice  matched  to  InP  for  electronic  and  optoelectronic  devices. 
These  include  ordered  strained  layer  superlattices  of  the  quaternary  AlGalnAs. 
The  transport  properties  of  such  structures  will  be  studied  in  other  JSEP  tasks. 
Several  device  structures  on  InP  will  be  prepared,  such  as  quantum  well  lasers 
and  advanced  transistors  structures,  to  confirm  the  improvements  in  materials 
synthesis  sought  after,  as  well  as,  to  support  the  new  investigators  in  the  new 
JSEP  program  (R.  Compton  and  Y-H  Lo). 

DISCUSSION  OF  STATE-OF-THE-ART 

The  following  discussion  of  the  state-of-the-art  is  organized  into  separate 
sections  on  growth  of  high  quality  GaAs  with  low  hydride  consumption,  gas 
phase  reactions  during  the  OMVPE  growth  of  AlGaAs  using  trimethylamine 
alane,  triethylgallium,  and  arsine,  and  the  growth  of  AlGalnAs  on  InP. 

Low  Hydride  Consumption  using  Flow  Modulation  Epitaxy 

One  of  the  major  disadvantages  of  hydride-based  low  pressure  OMVPE  has 
been  the  inefficient  use  of  hydrides.  For  example,  in  conventional  reduced 
pressure  OMVPE  growth  of  GaAs  using  trimethylygallium  (TMG)  and  arsine 
(AsH3),  high  molar  V/ni  ratios  are  needed  to  obtain  high  purity  material  [Ij. 
Efforts  have  been  made  to  reduce  ASH3  consumption,  including  precracking  of 
the  ASH32  and  substituting  triethylgallium  (TEG)  for  TMG  [3].  In  some  cases, 
less  toxic  group  V  sources  have  been  substituted  for  AsH3,  but  this  also  requires 
V/in  ratios  of  10  or  greater  to  yield  high  purity  material  [4].  Low  V/III  ratios  are 
used  in  metalorganic  molecular  beam  epitaxy  [5],  but  best  GaAs  results  are  p- 
type  and  have  carbon  concentrations  exceeding  5(10^“*). 
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Despite  the  attempts  to  reduce  the  AsHa  requirement  to  produce  high  quality 
GaAs,  most  low  pressure  OMVPE  growth  is  still  done  at  relatively  high  V/lII 
ratios.  This  poses  a  potential  safety  problem  due  to  the  expulsion  of  the  excess 
hydride  that  does  not  participate  in  the  growth  process  and  to  the  increased 
handling  of  the  source  containers.  Efforts  to  minimize  high  pressure  cylinder 
storage  include  an  on-demand  arsine  gas  generator,  but  a  low  77K  mobility  was 
observed  [6].  Since  conventional  OMVPE  is  not  able  to  produce  high  quality 
GaAs  at  V/in  ratios  near  unity,  a  new  growth  process  was  sought. 

Gas  Phase  Reactions  of  Trimethylamine  Alane  in  the  Growth  of  AlGaAs 

The  OMVPE  growth  of  aluminum  containing  IIl-V  compound  semiconductors 
has  traditionally  been  plagued  with  high  oxygen  and  carbon  incorporation. 
These  problems  are  due,  in  large  part,  to  the  widely  used  aluminum  source, 
trimethylaluminum  (TMA).  Triethylaluminum  (TEA)  is  also  used  as  an 
aluminum  source,  and  it  has  demonstrated  lower  carbon  incorporation  in 
AlGaAs  than  TMA.  However,  some  residual  oxygen  still  remains  using  TEA, 
and  TEA  has  a  low  vapor  pressure  (0.5  torr  at  55°C)  which  is  inconvenient  for 
OMVPE. 

Recently,  trimethlyamine  alane  (TMAA)  has  received  much  attention  as  a 
viable  alternative  aluminum  source  in  both  OMVPE  [7]  and  Chemical  Beam 
Epitaxy  (CBE)  [8].  Previous  reports  indicate  that  using  TMAA  along  with  TEG 
and  AsH3  under  the  appropriate  growth  conditions  (very  high  V/III  ratios  and 
gas  velocities)  can  result  in  the  high  purity  OMVPE  grown  AlGaAs  [7].  This  is 
believed  to  be  due  to  the  lack  of  a  direct  Al-C  bond  in  TMAA  and  to  its  ability  to 
form  involatile  Al-O  compounds  when  reacted  with  oxygen  and  H2O,  resulting 
in  reduced  oxygen  contamination. 

However,  TMAA  suffers  two  major  drawbacks.  Its  low  thermal  decomposition 
temperature  leads  to  predeposition  on  the  reaction  cell  sidewalls,  and  its  strong 
room  temperature  reactions  with  conventional  Ga  sources  (TMG  and  TEG) 
result  in  unwanted  parasitic  reactions  during  growth.  The  growth  chemistry 
using  these  precursors  in  OMVPE  must  be  understood  in  order  to  optimize 
film  quality.  Although  studies  investigating  the  effects  of  gas-phase  reaction 
between  TMAA  and  TEG  in  CBE  have  been  reported  [8],  no  previous  study 
exists  for  OMVPE. 

Short  period  Superlattices 

Short  period  superlattices  have  been  considered  to  be  a  viable  alternative  to  the 
conventional  use  of  bulk  ternary  and  quaternary  layers.  The  ordered 
arrangement  of  the  constituent  atoms  in  the  superlattice  can  improve  the 
optical  and  electrical  properties  of  the  superlattice  over  those  of  the  bulk 
material.  Potential  fluctuations  associated  with  the  random  arrangement  of 
constituent  atoms  in  the  mixed  crystal  are  absent  from  the  superlattice 
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structure,  resulting  in  the  reduction  of  alloy  scattering  and  consequently  to 
increased  mobility  [9].  Furthermore,  exciton  broadening  due  to  clustering  in 
the  mixed  crystal  can  be  avoided  by  using  short  period  superlattices. 

The  use  of  short  period,  strained  layer  superlattices  is  especially  important  in 
the  growth  of  AlxIni-xAs  and  GaxIn^-xAs  on  InP.  Because  these  materia] 
systems  are  lattice  matched  to  InP  when  x  is  approximately  0.5,  the  alloy 
scattering  and  exciton  broadening  effects  due  to  the  mixed  crystal  are  close  to 
their  maximum.  Thus,  short  period,  strained  layer  superlattices  should 
improve  optical  and  electrical  quality  in  these  alloys  dramatically. 

PRQGKESS 

In  this  section,  progress  on  the  flow  modulation  growth  of  GaAs  is  presented,  a 
basic  study  of  the  gas  phase  reactions  of  TMAA  with  TEG  and  ASH3  is 
summarized,  and  recent  progress  of  growth  of  AlGalnAs  lattice  matched  to  InP 
is  outlined. 

Flow  Modulation  growth  of  high  purity  GaAs 

Using  arsine  and  triethylgallium  with  flow  modulation,  a  process  has  been 
developed  that  produces  high  purity  GaAs  with  V/III  ratios  near  unity  and 
AsH3  incorporation  efficiency  exceeding  30%  [10].  This  process  greatly  reduces 
the  environmental  impact  and  life  safety  risk  of  the  hydride-based  OMVPE 
method. 

All  films  were  grown  using  flow  modulation  epitaxy  (FME)  at  76  torr  in  a 
imique  multichamber  cell  [11].  The  substrates  are  rotated  through  group  III  and 
group  V  spatially  rich  zones  without  valve  switching.  During  the  group  III 
exposure  cycle,  the  local  V/III  ratio  is  estimated  to  be  25%  of  the  average  value. 
The  growth  cell  and  exposure  cycle  for  the  growth  scheme  are  depicted 
schematically  in  Figure  1.  The  aluminum  source  is  included  for  completeness 
and  will  be  referred  to  in  the  next  progress  section. 

The  arsine  flow  requirements  for  obtaining  good  morphology  have  been 
determined  over  a  wide  range  of  growth  temperatures.  Good  morphology  was 
realized  for  V/III  ratios  as  low  as  unity  for  growth  temperatures  exceeding 
610°C.  Below  610°C,  the  V/III  ratio  must  be  increased  to  maintain  good  surface 
morphology  due  to  inefficient  pyrolysis  of  the  arsine. 


TOP  VIEW 
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Figure  1.  (a)  Schematic  illustration  of  implementation  of  FME  in  the  multichamber  cell.  Two 
TEG  sources,  one  on  each  growth  zone,  allow  for  conventional  premixed  injection  or  spatially 
separated  group  III  sources.  The  arsine  is  uniformly  injected  around  the  cell.  The  inner  quartz 
ampoule  (diameter  d)  serves  to  separate  the  reactant  fluxes  of  each  deposition  zone,  (b)  Exposure 
cycle  for  premixed  and  spatially  separated  TMAA  and  TEG.  The  AsHs  ^ow  is  uniformly 
distributed  around  the  cell.  Dotted  lines  represent  the  reactant  zero  flux  reference.  The  degree  of 
deposition  zone  separation  is  indicated  schematically  by  the  set  of  arrows  in  the  lower  diagram. 

The  electrical  quality  of  the  films  grown  at  635°C  and  with  V/III  ratios  ranging 
from  1.8  to  22  was  assessed  using  room  temperature  and  77  K  Hall 
measurements.  All  films  were  n-type;  no  p/n  crossover  with  decreasing  V/III 
ratio  was  observed.  The  net  impurity  concentration  varied  from  3.7(10^'*)  to 
6.9(10H)  cm*3  while  the  total  impurity  concentration  varied  from  7.7(10^'*)  to 
2.0(10^5)  cm*^.  The  77  K  mobility  ranged  from  55,000  to  110,000  cm^/V  s,  with 
the  maximum  value  obtained  at  V/III  ratio  of  5.  A  V/III  ratio  of  1.8 
(corresponding  to  an  arsine  efficiency  of  17.2%)  resulted  in  a  77  K  mobility  of 
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93,000  cm^/V  s.  The  highest  room- temperature  mobility  was  greater  than  8000 
cm2/V  s. 

Optical  quality  of  the  samples  grown  at  635°C  was  evaluated  using  low 
temperature  (1  K)  photoluminescence  (PL).  For  V/III  ratios  greater  than  1.8,  the 
dominant  feature  in  the  excitonic  region  of  the  spectra  was  the  neutral  donor 
exciton  (D®,  X).  This  along  with  a  negligible  acceptor  exciton  feature  indicates 
strongly  n-type  material  which  is  consistent  with  the  Hall  measurements. 
Films  grown  with  V/IH  ratios  greater  than  1.8  demonstrated  a  single  acceptor- 
related  transition  attributed  to  Mg,  while  those  grown  at  V/III  ratios  between 
1.8  and  1  showed  acceptor  transitions  due  to  Mg  and  C.  Significantly,  normal 
excitonic  features  persisted  until  the  V/III  ratio  was  reduced  to  subunity.  In  the 
subunity  film,  new  features  near  1.503  eV,  possibly  due  to  defect  related  exciton 
emission,  appeared.  Films  grown  at  subunity  V/III  ratios  also  exhibited  poor 
surface  morphology  and  reduced  growth  rates,  indicative  of  arsenic  limited 
growth. 

Gas  Phase  Reactions  of  Trimethylamine  Alane 

Gas  phase  reactions  between  TMAA,  TEG,  and  ASH3  have  been  investigated 
[12].  Two  dominant  effects  are  observed:  a  TMAA-AsHs  reaction  results  in 
varying  film  composition  with  V/in  ratio,  and  a  TMAA-TEG  reaction  which 
severely  degrades  thickness  uniformity.  The  effects  of  each  of  these  gas  phase 
reactions  in  the  upstream  portion  of  the  reaction  cell  were  identified  by 
spatially  separating  TMAA  and  TEG  in  the  gas  phase  using  the  multichamber 
growth  cell  (see  Figure  1).  Material  grown  in  the  traditional  premixed  growth 
mode  was  compared  with  that  grown  in  the  new  spatial  separation  injection 
scheme. 

The  TMAA-AsHa  gas  phase  reaction  was  investigated  in  the  conventional 
premixed  growth  mode  by  holding  the  growth  temperature  (670°C)  and  group 
in  flux  constant  and  varying  the  AsHa  flow  (V/III  ratio).  As  shown  in  Figure  2, 
the  A1  content  (determined  by  Raman  scattering  and  confirmed  by  double 
crystal  x-ray  diffraction)  was  independent  of  V/III  ratio  at  large  V/III  ratios  (20- 
80)  but  increased  as  the  V/III  ratio  was  decreased  below  20.  This  effect  is 
attributed  to  the  TMAA-ASH3  reaction.  At  low  V/III  ratios,  less  AsHs  is  present 
to  form  involatile  compounds  with  the  TMAA  and  prevent  its  incorporation 
in  the  growing  film.  Although  growth  at  low  V/III  ratios  is  more  efficient,  the 
optical  quality  degrades  with  decreasing  V/III  ratio.  The  full  width  at  half 
maximum  exciton  linewidth  of  2.2  meV  compares  favorably  with  the 
narrowest  linewidth  ever  reported  for  AlGaAs  of  that  composition  [13],  and  the 
reduced  bui  still  clearly  identifiable  exciton  for  the  sample  grown  with  a  V/IIl 
ratio  of  7.5  is  significant.  The  fact  that  large  arsine  flows  are  needed  to  produce 
AlGaAs  with  good  optical  quality  may  imply  that  the  TMAA-AsHs  reaction 
inhibits  the  TMAA-TEG  reaction  which  is  demonstrated  below  to  severely 
degrade  the  quality  of  the  AlGaAs  films. 
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Figure  2.  Dependence  of  Ai  composition  on  V/ITI  ratio  for  constant  TEG  and  TMA  A  fluxes  at  67£TC. 
The  inset  depicts  the  corresponding  IK  photolumjnescence  spectra  for  several  V/IIl  ratios. 

The  reactant  fluxes  were  held  constant,  and  a  relatively  high  V/III  ratio 
(V/ni=S0)  was  used  to  eliminate  the  effects  of  arsine  flows  described  above  to 
investigate  the  effect  of  the  TMAA-TEG  gas  phase  reaction  on  the  growth. 
Premixed  growth  of  AlGaAs  resulted  in  a  growth  rate  that  was  approximately 
half  that  of  GaAs  grown  with  same  TEG  reactant  flux.  This  was  attributed  to 
depletion  of  Ga  in  the  gas  phase  by  the  TMAA-TEG  reaction  and  confirmed  by 
the  drop  in  Al  composition  from  79%  during  premixed  growth  to  40%  during 
spatially  separated  growth.  Although  both  premixed  and  spatially  separated 
grown  material  exhibited  excellent  compositional  uniformity,  the  broad  x-ray 
epitaxial  peak  characteristic  of  premixed  grown  material  is  believed  to  be  due 
to  compositional  grading.  Furthermore,  severe  thickness  nonuniformity  (16% 
over  a  20  mm  diameter)  was  characteristic  of  premixed  growth  while  excellent 
thickness  uniformity  (1%  over  a  20  mm  diameter)  was  obtained  during 
spatially  separated  growth.  Finally,  the  optical  quality  of  the  premixed  growth 
as  determined  by  low  temperature  photoluminescence  was  inferior  to  that  of 
the  spatially  separated  grown  AlGaAs.  A  possible  explanation  for  this  effect  is 
that  in  addition  to  the  IMAA  and  TEG  forming  nonvolatile  compounds  which 
reduce  the  growth  rate,  volatile  compounds  are  also  present  which  participate 
in  the  growth  process  and  incorporate  non-radiative  center  in  the  epitaxial 
layers. 


AlGalnAs  Lattice  Matched  to  Int 


GalnAs  and  AlInAs  lattice  matched  to  InP  are  currently  being  grown  by  flow 
modulation.  The  structural  quality  and  lattice  parameter  of  the  films  was 
determined  by  Raman  scattering  and  double  crystal  x-ray  diffraction.  Typical  x- 
ray  diffraction  rocking  curves  are  shown  in  Figure  3.  The  broad  shoulder  is 
indicative  of  graded  indium  content  along  the  grow'th  direction.  Growth  on 
InP  has  been  suspended  to  modify  the  reactor  and  correct  this  grading  problem. 
A  Thomas  Swan  Epison  gas  concentration  analyzer  has  been  ordered  and  will 
be  used  to  precisely  morutor  and  control  (w/  fe^back)  the  TMIn  flux  out  of  the 
bubbler.  Once  the  Epison  is  in  place,  growth  of  AlGalnAs  will  continue.  After 
growth  conditions  of  the  mixed  crystals  are  optimized,  the  study  will  be 
broadened  to  identify  the  effects  of  premixing  and  spatially  separating  the 
organometallic  precursors  and  to  compare  the  properties  of  the  mixed  crystal 
and  the  corresponding  short-period  superlattices.  Finally,  once  the  stimulated 
growth  reactor  is  on-line,  these  basic  studies  w'ill  help  to  accelerate  the  selected 
growth  of  AlGalnAs  on  InP. 


-300  -200  -100  0  100  200  300 
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Figure  3.  GalnAs  on  InP.  Both  In  (bottom  curve)  and  Ga  (top  curve)  rich  material  exhibit  broad 
epitaxial  x-ray  rocking  curve  peaks.  The  shoulder  on  the  negative  angle  side  of  the  each  curve  is 
indicative  of  graded  material. 

SaENTinC  IMPACT  OF  RESEARCH 

The  scientific  impact  of  this  research  is  threefold.  First,  the  growth  of  high 
quality  AlGaAs  by  flow  modulation  at  low  V/III  ratios  greatly  reduces  the 
environmental  impact  and  life  safety  risk  of  the  hydride  based  OMVPE  grow’th 
process.  Second,  the  investigation  of  the  gas  phase  reactions  of  TMAA  with 


TEG  and  AsHa  provides  a  much  needed  explanation  of  the  growth  chemistry  of 
AlGaAs  with  these  reactants.  It  should  be  noted  that  this  study  would  have 
been  impossible  without  Cornell's  unique  multichamber  reactor.  Finally,  the 
growth  of  AlGalnAs  alloys  and  stained  layer  superlattices  on  InP  in  the 
multichamber  cell  will  allow  the  investigation  new  ordered  quaternary 
structures  for  improved  electronic  properties. 
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FEMTOSECOND  LASER  STUDIES  OF  ULTRAFAST  PROCESSES  IN 
COMPOUND  SEMICONDUCTORS 

Task  #2 

Task  Principal  Investigaton  C.  L.  Tang 

(607)  255*5120 


The  objective  of  this  task  is  to  develop  new  femtosecond  sources  and 
measurement  techniques  and  to  use  such  sources  and  techniques  to  study 
ultrafast  processes  in  semiconductors  and  related  quantum  structures.  On 
source  development,  current  emphasis  is  on  high  repetition  rate  all-solitl* 
state  femtosecond  sources  and  in  extending  the  tuning  range  of  such  sources, 
in  particular  to  the  mid  infrared  and  to  the  visible.  On  optical  measurement 
techniques,  current  emphasis  is  on  developing  optical  sampling  techniques 
with  femtosecond  time  resolution  based  on  the  up-conversion  process.  These 
sources  and  techniques  are  being  applied  to  the  study  of  the  relaxation 
dynamics  of  non-equilibrium  carriers  in  elemental  and  compound 
semiconductors  and  quantum  well  structures.  The  capture  problem  and  the 
problem  of  tunneling  of  coherent  wave  packets  in  quantum  wells  are  of 
particular  interest  at  the  present  time. 


Almost  all  the  work  on  femtosecond  optics  and  ultrafast  processes  in  the  past 
has  been  based  on  the  use  of  the  mode-locked  Rh6G  femtosecond  dye  laser  as 
the  primary  source  of  short  pulses  of  light.  The  trend  recently  has  been  to 
move  away  from  the  dye  lasers  to  all-solid-state  short  pulse  sources.  CW 
mode-locked  Ti-doped  sapphire  laser  has  been  the  most  widely  used  new 
primary  femtosecond  laser  source.  The  Ti:sapphire  laser  is  tunable  over  the 
range  of  720  nm  to  about  1  mm.  The  emphasis  of  our  work  has  been  to 
extend  the  useful  spectral  range  of  all-solid-state  femtosecond  lasers  to  beyond 
this  range  through  nonlinear  optical  techniques.  Very  significant  progress 
has  been  made  in  this  effort  during  the  past  year  and  the  results  are  discussed 
below  in  the  Progress  section. 

In  the  case  of  femtosecond  optical  measurement  techniques,  most  of  the  past 
studies  of  ultrafast  phenomena  have  been  based  upon  some  sort  of  pump- 
probe  measurement,  including  the  related  optical  correlation  spectroscopic 
techruques.  All  these  techniques  suffer  from  the  fact  that  during  the  probing 
process,  the  system  being  measured  is  also  disturbed  to  some  extent.  To  avoid 
perturbing  the  system  being  measured,  the  time-resolved  hot  luminescence 
up-conversion  technique  has  been  developed  to  study  the  relaxation 
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dynamics  of  non-equilibrium  carriers  in  semiconductors  at  a  number  of 
laboratories  recently,  including  Cornell.  This  technique  allows  optical 
sampling  with  a  time  resolution  on  the  order  of  50  fs  of  the  very  weak  hot 
luminescence  emitted  by  the  carriers  during  the  relaxation  process.  This 
technique  has  now  been  well  developed  where  the  dark  noise  count  is  down 
to  half  a  photon  per  second  and  has  been  used  successfully  to  yield 
unambiguous  data  on  the  cooling  rates  of  hot  carriers  in  bulk  GaAs  and 
GaAs/AlGaAs  quantum  wells  at  high  carrier  densities.  The  need  is  now  to  go 
beyond  the  GaAlAs  systems  and  apply  these  techniques  to  other  materials 
such  as  Si,  porous  Si,  or  the  E-VI  compounds  and  quantum  structures. 

In  terms  of  the  source  wavelength,  this  source  is  ideal  because 
photoexcitation  at  2  eV  can  prepare  carrier  states  up  to  500  meV  in  GaAs  and 
related  materials  and  structures.  With  the  new  tunable  sources  now  ranging 
from  0.5  eV  to  almost  4  eV,  and  with  further  extension  to  the  10  mm  a 
distinct  possibility  [see,  proposed  work  below],  we  propose  to  go  beyond  the 
G.iAs-related  materials.  More  specifically,  we  propose  to  study  important 
electronic  and  optical  materials  such  as  Si,  Ge,  ZnSe,  InGaAs,  InGaAsP,  etc., 
and  in  structures  such  quantum  wells,  quantum  wires,  super-lattices,  and  in 
particular  porous  Si.  The  current  state-of-the-art  is  such  that  virtually  no 
work  on  these  materials  has  been  done  on  the  very  short  femtosecond  time 
scale. 

,Si  is  obviously  the  most  important  electronic  material.  There  is  a  clear  need 
to  explore  the  possibility  of  using  some  form  of  Si  for  optoelectronic 
applications.  There  have  been  recent  reports  that  porous  Si  shows 
surprisingly  strong  photoluminescence  and  electro-luminescence  in  the 
visible.  It  has  been  suggested  that  the  luminescence  has  something  to  do  with 
reduced  dimensionality  and  quantum  confinement,  but  there  is  no  consensus 
on  either  the  experimental  results  or  their  interpretation.  Femtosecond  laser 
study  is  clearly  another  avenue  for  identifying  the  origin  of  the  luminescence 
and  clarifying  the  physics.  We  have  recently  developed  a  capability  [in 
collaboration  with  IBM  Yorktown  Heights  through  an  IBM  Fellowship 
student  working  both  at  Yorktown  Heights  and  at  Cornell]  to  prepare 
reproducible  porous  Si  samples.  We  hope  to  have  preliminary  results  on  the 
dynamics  of  photoexcited  carriers  in  porous  Si  in  the  near  future.  At  this 
point,  we  do  not  know  what  to  expect,  although  we  have  some  conjectures. 

There  has  been  a  great  deal  of  interest  in  ZnSe  semiconductor  lasers  in  the 
blue-green  region  recently.  However,  very  little  is  known  about  the 
relaxation  dynamics  of  hot  carriers  in  ZnSe.  A  number  of  the  important 
relaxation  processes  and  the  corresponding  rates  are  not  known.  To  develop 
room-temperature  cw  blue  semiconductor  lasers,  which  is  a  major  direction 
of  semiconductor  laser  research  today,  it  is  essential  that  these  parameters  be 
known.  Our  recently  developed  femtosecond  laser  in  the  blue  is  uniquely 


15 


suited  for  such  studies.  We  are  preparirtg  for  experimental  studies  on  this 
material  in  the  femtosecond  time  domain. 

Ternary  and  quarternary  compound  semiconductors  are  of  importance  to 
long  wavelength  (1.3  pm  and  1.5pm)  lasers.  Ultrafast  studies  in  these 
materials  have  been  hampered  by  the  lack  of  suitable  ir  femtosecond  sources. 
The  ir  femtosecond  OPOs  developed  in  our  laboratory  are  uniquely  suited  for 
studying  these  materials.  The  relaxation  dynamics  in  these  materials  are 
more  complicated  than  those  in  GaAs  or  AlGaAs.  Understanding  of  the 
relaxation  dynamics  related  to  the  Auger  processes  in  InGaAs  and 
InGaAs/InP  quantum  wells  are  of  particular  importance  for  long  wavelength 
lasers;  further  work  is  needed. 

In  terms  of  different  structures  with  reduced  dimensionalities,  planar  2-D 
GaAs /AlGaAs  quantum  wells  have  been  studied  extensively  and  are 
relatively  well  known.  Because  of  the  difficulties  in  fabricating  quantum 
wires  and  boxes  with  dimensions  on  the  order  of  10  nm  or  less,  much  less  is 
known  about  these  structures.  There  is,  however,  a  great  deal  of  interest  in 
such  structures  for  a  variety  of  electronic  and  optic  device  applications.  J.  R. 
Shealy  has  been  able  to  fabricate  quantum  wires  on  the  order  of  of  30x50  nm 
using  MCXIVD  overgrowth  on  groves  etched  in  GaAs  by  focused  ion  beam. 
With  better  control  in  the  growth  and  ion  beam  etching  processes,  fabrication 
of  wires  of  dimensions  small  enough  to  exhibit  quantum  effects  should  be 
possible.  Ultrafast  laser  studies  will  elucidate  the  relaxation  dynamics  of  hot 
carriers  in  such  quasi-one-dimensional  quantum  structures.  Because  of  the 
shifts  in  the  excitation  energies  of  various  relevant  states  due  to  quantum 
confinement,  wavelength  tunability  is,  again,  an  important  consideration  in 
photoexcitation  studies  in  such  structures. 

In  all  these  studies,  the  significantly  increased  power  of  the  new  femtosecond 
OPO  sources  will  greatly  increase  the  sensitivity  of  time-resolved  hot 
luminescence  spectroscopy  based  upon  the  nonlinear  optical  upconversion 
process  coupled  with  photon-counting.  Since  the  signals  in  most  such  pump- 
and-probe  measurement  techniques  are  proportional  to  at  least  the  square  of 
the  intensity  of  the  fs  pulses,  an  increase  of  2  orders  of  magnitude  in  the 
intensity  of  the  pulses  as  has  been  achieved  with  our  new  fs  OPOs  means  an 
increase  of  at  least  4  orders  of  magnitude  in  the  sensitivity  as  compared  to  the 
Rh6G-based  measurement  systems  currently  in  use.  This  improvement  in 
sensitivity  will  vastly  increase  our  capabilities  to  study  various  ultrafast 
processes. 


PROGRESS 

New  Developments  in  Femtosecond  Sources 
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A  principal  direction  of  research  in  the  field  of  femtosecond  optics  and 
ultrafast  processes  is  to  extend  the  wavelength  range  of  the  femtosecond 
sources  and,  as  a  consequence,  expand  the  types  of  ultrafast  processes  and 
materials  that  can  be  studied.  In  our  last  three-year  proposal,  one  of  the  main 
objectives  is  to  extend  the  femtosecond  laser  source  into  the  uv  through 
various  nonlinear  optical  sum-frequency  processes  and  into  the  ir  through 
the  optical  parametric  processes.  We  have  been  very  successful  in  both  areas. 

In  the  uv,  we  have  previously  demonstrated  [1]  the  shortest  pulse  (43  fs)  at 
315  nm,  but  the  power  was  relatively  low  by  today's  standards,  in  the  10  mW 
range,  but  quite  respectable  by  the  standards  prior  to  our  most  recent  work  in 
the  last  few  months.  Most  recently  we  demonstrated  [2]  the  generation  of 
high-repetition-rate  femtosecond  pulses  in  the  blue  for  the  first  time  by 
intracavity  doubling  of  a  mode-locked  Titsapphire  laser  using  (i-BaB204.  To 
reduce  the  pulse  broadening  effect  of  group-velocity  mismatch,  an  extremely 
thin  (3-BaB204  crystal  was  used.  Over  800  mW  of  430  nm  light  is  produced  at 
72  MHz  repetition  rate  and  pulse  widths  as  short  as  54  fs  are  achieved. 

From  the  deep  red  to  the  mid  ir,  we  have  demonstrated  the  first  broadly 
tunable  femtosecond  optical  parametric  oscillator  [3].  With  intracavity 
pumping  by  a  Rh6G  dye  laser,  tunable  femtosecond  pulses  from  720  nm  to 
nearly  3.5  p.m  have  been  achieved  at  a  pulse  repetition  rate  of  80  Mhz  and 
power  level  of  a  few  mW  13-5].  Dramatically  improved  power  levels  have 
been  achieved  most  recently  using  external-cavity  pumping  by  Ti .-sapphire 
lasers.  Record  femtosecond  Ti:sapphire  laser  output  powers  up  to  3.8  W  have 
been  achieved  recently  in  our  laboratory.  Tunable  femtosecond  pulses  in  the 
multi-hundred  mW  range  with  a  total  conversion  efficiency  of  55%  from 
femtosecond  optical  parametric  oscillators  (OPO)  have  been  demonstrated 
most  recently  [6-10].  While  the  intracavity  pumped  fs  OPO  is  extremely 
difficult  to  operate,  the  external-cavity  Ti.sapphire  laser  pumped  OPOs  are 
relatively  easy  to  construct  and  reproduce.  Since  our  initial  work,  there  has 
been  extensive  interest  on  the  part  of  various  laser  companies  to 
commercialize  the  technology.  Technology-transfer  agreements  have  been 
signed  with  one  of  the  largest  laser  companies,  Spectra-Physics.  With  the 
general  availability  of  such  powerful  tunable  fs  sources,  we  expect  that  there 
will  be  rapid  advances  in  the  field  of  ultrafast  studies  over  spectral  regions 
that  had  been  out  of  reach  before.  Figure  1  shows  a  summary  of  the  current 
status  of  tunable  high  repetition  rate  femtosecond  sources  and  the  pump 
sources  available.  Apart  from  the  two  pump  sources,  Rh6G  dye  laser  and  the 
Ti:sapphire  laser,  all  the  other  sources  were  first  demonstrated  in  our 
laboratory  under  primarily  JSEP  sponsorship. 
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UUrafast  Relaxation  Dynamics  in  Semiconductors 

The  investigation  of  hot-carrier  relaxation  in  both  quantum  well  (QW)  and 
bulk  semiconductor  structures  has  been  a  topic  of  great  importance  and 
considerable  interest.  An  investigation  of  the  hot-carrier  relaxation  in 
GaAs/(Al,Ga)As  quantum  wells  and  bulk  GaAs  in  the  high-carrier-density 
limit  has  been  carried  out  [11-13].  Using  a  time-resolved  luminescence 


nonlinear  optical  up-conversion  technique  with  ^  80-fs  temporal  resolution, 
carrier  temperatures  are  measured  in  the  lOO-fs-to-2-ns  range.  The  results 
show  unambiguously  that  the  hot-carrier  cooling  rates  in  the  quantum  wells 
are  significantly  slower  than  in  the  bulk  for  carrier  densities  greater  than 

2x10^®  cm*3.  A  comparison  is  made  with  previous  publications  to  resolve  the 
controversy  concerning  the  difference  in  cooling  rates  in  quasi-two-  and 
three-dimensional  systems.  These  results  are  important  for  applications 
related  to  injection  of  hot  carriers  in  high  speed  electronic  devices  and  high 
efficiency  solar  cells. 

SaENTinC  IMPACT  OF  RESEARCH 

The  femtosecond  sources  and  measurement  techniques  developed  should  be 
of  great  use  to  others  in  the  scientific  community.  The  results  obtained  on 
the  dynamics  of  nonequilibrium  carriers  in  III-V  compounds  and  structures 
are  of  fundamental  importance  to  the  understanding  of  the  physics  and  the 
design  of  ultra-high  speed  semiconductor  electronic  and  optical  devices. 
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This  task  represents  the  device  measurement  portion  of  the  JSEP  program. 
We  have  been  studying  ultrafast  relaxation  processes  in  narrow  bandgap 
semiconductors  using  tunable  ultrashort  optical  pulsed  lasers  to  determine 
carrier  dynamics.  These  measurements  have  been  made  in  bulk  materials 
using  thin  films.  Our  goal  is  to  generate  solid  experimental  evidence  of  the 
carrier  scattering  processes  that  occur  for  energies  near  the  bandedge  of  the 
semiconductor.  Data  has  been  used  to  test  modelling  calculations  performed 
by  Prof.  Krusius'  group.  In  this  interaction,  we  hope  to  both  improve  the 
quality  of  the  data  we  acquire,  and  to  test  the  models  developed  by  the 
Krusius  group  through  rigorous  comparison  of  data  with  modelling  results. 
We  cdso  plan  on  improving  our  equipment  by  extending  the  tuning  range 
and  pulse  performance  of  our  probe  lasers. 

DISCUSSION  OF  STATE-OF-THE-ART 

Studies  of  hot  carrier  relaxation  have  been  carried  out  in  GaAs  and  AlGaAs 
for  several  years.  Only  recently  has  there  been  the  ability  to  study  narrow 
bandgap  semiconductors  with  femtosecond  probes.  Roskos  et.  al  [1]  have 
reported  on  the  cooling  of  photoexcited  carriers  in  undoped  InGaAs  samples 
similar  to  the  films  we  are  studying,  only  their  work  was  done  at  30  *K,  and 
with  significantly  higher  probe  energy.  Using  a  dye  laser  operating  at  1265 
nm,  they  created  hot  carrier  populations  approximately  200  meV  above  the 
conduction  band  minimum.  They  observed  recovery  of  the  transient 
absorption  to  occur  with  time  constants  1  and  10  psec  representing  depletion 
of  the  electron  and  heavy  hole  states,  respectively.  Being  at  a  higher  excess 
energy  than  our  measurements,  the  observed  relaxation  lifetimes  are  not 
directly  comparable  between  their  experiments  and  our  experiments,  as  their 
carrier  distribution  is  influenced  more  strongly  by  Frohlich  scattering.  Our 
experiments  provide  a  nice  complement  to  this  work,  giving  a  picture  of 
carrier  thermalization  mechanism's  weights  as  one  approaches  the  bandedge. 

Nunenkamp  et.  al  [2]  describe  an  experiment  similar  to  ours  in  a  different 
material,  AlGaAs.  They  probed  carrier  relaxation  near  the  band-edge  of  the 
material  using  a  tunable  femtosecond  laser.  They  developed  a  model 
including  excitonic  effects,  however  their  data  was  taken  at  30  *K  compared  to 
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our  300  *K.  Significantly,  they  found  band-gap  renormalization  essential  to 
fit  the  data  to  their  simulations.  This  is  consistent  with  the  results  we  have 
obtained  in  our  collaboration  with  the  modelling  efforts  of  Krusius'  group. 

ERQGRESS 

Progress  in  the  last  year  is  summarized  below.  We  have  worked  closely  with 
Krusius  in  measuring  the  carrier  relaxation  time  in  bulk  GalnAs/lnP  layers. 
The  data  from  our  measurements  have  been  used  to  test  the  models 
developed  by  Krusius  and  Bair,  and  results  from  their  fitting  has  been  used  to 
modify  our  experiments.  Emphasis  in  our  group  has  been  on  the 
experimental  side  of  the  problem,  especially  with  obtaining  exceptionally 
high  signal-to-noise  data  which  can  be  used  to  test  the  models. 

Several  layers  of  InGaAs  were  obtained  from  the  Cornell  School  of  Electrical 
Engineering  MBE  facility,  and  were  probed  with  our  tunable  color  center 
laser.  We  studied  three  different  thiclmess  layers;  0.5  pm,  1  pm,  and  3  pm. 
The  color  center  laser  we  used  was  able  to  generate  100  fsec  pulses  over  the 

0.75  0.85  eV  range,  which  is  near  the  bandgap  of  Ino.53Gao.47As  lattice 

matched  to  InP.  A  100  fsec  pulse  from  the  laser  was  used  to  create  carrier 
densities  ranging  in  the  10^  ^  range.  Using  the  equal  pulse  autocorrelation 
technique  developed  by  Tang  [3],  a  second  pulse  probed  the  recovery  of  the 
optical  absorption  as  a  function  of  time  by  entering  the  sample  at  the  same 
point,  but  at  a  delayed  time.  As  the  carriers  recover  back  to  their  equilibrium 
distribution,  the  optical  absorption  returns  to  its  original  value. 

We  observe  changes  in  transmission  on  the  order  of  approximately  1%  for  a 
carrier  density  change  of  lO^^/cm^.  A  typical  data  trace  is  shown  in  Fig.  1.  The 
figure  shows  the  measured  autocorrelation  transmission  trace  from  a  0.5  pm 
thick  InGaAs  layer  probed  at  1.596  pm.  The  inner  trace  is  the  autocorrelation 
of  the  probe  laser  pulse.  The  quality  of  the  data  can  be  rapidly  judged  by  the 
experimentalist  through  the  symmetry  of  the  plot.  This  data  was  fed  to 
Krusius  for  analysis,  and  is  discussed  in  his  task.  A  joint  publication  is  under 
preparation. 

Fitting  this  data  to  simple  single  and  double  exponential  decay  models 
showed  there  were  some  trends  with  increased  probe  energy.  Such  data  is 
shown  in  Fig.  2.  However  based  on  simulations  and  discussions  with 
Krusius,  it  was  decided  that  the  decay  process  was  too  complicated  to  be 
described  in  terms  of  one  or  two  lifetimes. 

Initial  comparison  of  our  temporal  data  to  the  simulations  of  Krusius  showed 
that  our  raw  data  was  not  sufficiently  noise-free  to  make  substantive  claims  as 
to  the  accuracy  of  the  simulation.  It  was  determined  that  the  absolute 
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reproducibility  (not  simply  Signal-to*Noise  ratio)  had  to  be  better  than  1  part 
in  100.  This  posed  quite  a  problem,  as  there  are  dozens  of  experimental  effects 


Delay  (fsec) 

Figure  1.  An  equal  pulse  autocorrelation  trace  taken  with  InGaAs  at  a  wavelength  of  1.596  ^lm. 
The  inner  trace  is  the  autocorrelation  of  the  laser  pulse;  the  outer  trace  is  the  transient 
transmission  of  the  semiconductor  sample. 


Exponential  Fit  Results 
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Figure  2.  Exponential  fits  to  the  observed  transient  decay  of  the  transmission,  as  a  function  of 
excess  energy  and  carrier  density. 

which  can  systematically  alter  the  overall  shape  by  such  magnitudes.  We 
invested  a  significant  effort  into  reducing  systematic  errors,  by  linearizing 
detectors,  optimizing  the  PZT  transducers  and  galvodrivers,  and  by  carefully 
coating  and  selecting  all  optics  to  avoid  unexpected  reflections.  The  absolute 
reproducibility  (measured  from  week-to-week,  not  run-to-run),  has  been 
improved  dramatically  to  about  1%.  An  example  of  two  data  runs  taken 
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about  four  months  apart,  but  under  experimentally  identical  conditions,  is 
shown  in  Fig.  3. 


Figure  3.  Two  sets  of  data  taken  under  nearly  identical  conditions,  but  at  two  different  times. 
The  absolute  agreement  of  obtained  data  is  within  1  %  from  one  run  to  the  next. 

This  summer  (1992),  Krusius  and  his  students  modified  their  model  to 
include  bandgap  renormalization  effects.  This  modification  has  had  a 
dramatic  impact  on  the  quality  of  the  fit  between  data  and  model.  We  now 
find  excellent  agreement  in  the  100-400  fsec  region,  while  there  is  still  some 
disagreement  for  data  beyond  500  fsec.  We  are  repeating  some  key 
measurements  at  this  time  to  determine  whether  the  model  or  the  data  is  at 
fault.  This  collaboration  has  finally  paid  off  in  terms  of  identifying  the  major 
physical  processes  that  occur  in  hot  carrier  relaxation  in  InGaAs. 

The  tuning  range  of  the  color  center  laser  used  in  these  measurements 
coincides  well  with  the  bandgap  of  InGaAs,  but  it  does  not  have  sufficient 
tuning  range  to  excite  electrons  more  than  about  2  LO  phonons  above  the 
bandgap.  Based  on  work  reported  by  Petricevic  [4]  we  built  a  forsterite  laser 
that  operates  from  1.2  -» 1.32  ^un,  or  0.93  -♦  1.03  eV.  This  energy  range  allows 
the  excitation  of  carriers  to  about  250  meV  above  the  conduction  band  edge  in 
the  InGaAs  samples  we  are  using,  but  does  not  lift  them  high  enough  to 
allow  inter-valley  scattering.  This  probe  wavelength  is  similar  to  that  used  by 
Roskos  et.  al  [1],  except  that  we  have  much  more  power  and  a  shorter  pulse 
width,  with  a  broader  tuning  range.  We  can  stably  generate  48  fsec  pulses  at 
an  80  MHz  rate  with  an  average  power  of  380  mW.  We  recently  frequency 
doubled  this  laser  to  generate  25  mW  of  femtosecond  pulses  at  630  nm.  Work 
is  underway  to  apply  this  new  source  to  some  unique  two  wavelength  pulse 
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probe  measurements.  A  paper  has  been  submitted  on  the  laser  result  (5),  and 
another  is  in  preparation  on  the  frequency  doubling. 

SCIENTinC  IMPACT  Qr.WQRK 

The  major  impact  of  this  work  lies  in  the  identification  of  carrier-carrier 
scattering  as  the  dominant  scattering  process  near  the  bandedge  of  InGaAs. 
The  carrier  densities  we  are  exploring  are  similar  to  those  used  in 
semiconductor  quantum  well  lasers,  where  high  speed  carrier  capture  is 
critical.  We  can  now  identify  characteristic  lifetimes  for  various  physical 
scattering  processes  in  this  material  system. 

Secondly,  this  data  has  been  used  to  critically  test  the  Monte-Carlo 
simulations  of  the  Krusius  group,  and  have  provided  a  motivation  to  modify 
and  improve  the  model.  Similarly,  the  modelling  efforts  have  focussed  our 
attention  on  eliminating  systematic  er»'ors,  and  we  are  confident  that  the 
measurements  we  report  represent  the  best  data  to  date  on  relaxation 
phenomena  near  the  bandedge  of  InGaAs. 
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QgJECnVE 

The  objective  in  this  work  unit  is  to  explore  non-equilibrium  carrier 
processes  governing  electron  and  hole  transport  and  optical  interactions  in 
inhomogeneous  compound  semiconductor  heterostructures  theoretically. 
Electron  and  hole  interactions  among  themselves,  the  semiconductor  lattice, 
optical  fields,  and  external  electric  fields  are  described  using  self-consistent 
ensemble  Monte  Carlo  formulations.  This  work  is  done  collaboratively  with 
femtosecond  optical  measurements  and  materials  growth  efforts.  In  the  area 
of  femtosecond  optical  probing  with  tunable  lasers,  joint  work  is  performed 
together  with  C.  Pollock's  research  group  in  order  to  design  samples,  optical 
experiments,  analyze  measured  data,  and  extract  microscopic  information  of 
femtosecond  carrier  processes. 

DISCUSSION  OF  STATE-OF-THE-ART 

The  relaxation  of  carriers  excited  by  ultra-short  optical  pulses  has  been 
intensely  studied  both  experimentally  and  theoretically  for  several  years. 
Despite  this  a  full  understanding  of  the  complex  carrier  dynamics  in  these 
highly  non-equilibrium  situations  is  still  incomplete.  On  the  theoretical  side, 
current  Monte  Carlo  models  of  carrier  relaxation  have  achieved  considerable 
success  in  explaining  many  qualitative  features  of  experimental  observations 
[1,  2].  However,  a  great  deal  of  uncertainty  remains.  At  the  heart  of  these 
uncertainties  lies  the  role  of  the  Coulomb  potential  in  the  carrier-carrier 
interaction.  This  interaction  expresses  itself  in  a  number  of  important 
phenomena.  Most  familiar  is  the  free  carrier  dielectric  function,  which  affects 
all  carrier  scattering  processes  through  screening,  but  the  Coulomb  potential 
also  enters  the  problem  through  carrier-carrier  scattering  and  two  carrier 
correlation  effects,  band  renormalization,  and  the  Coulomb  enhancement  of 
optical  absorption. 

Free  carrier  screening  and  carrier-carrier  scattering  have  received  a  large 
amount  of  attention.  Until  recently  models  universally  assumed  that  free 
carrier  screening  can  be  adequately  described  by  using  long  wavelength  static 
approximations.  However,  recent  results  have  indicated  that  carrier-carrier 
scattering  may  be  seriously  underestimated  in  a  static  screening  limit  [3-6]. 
Since  the  two  most  important  scattering  processes  in  compound  semi- 
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conductors,  ix)lar  optic  phonon  scattering  and  carrier-carrier  scattering,  are 
both  heavily  dependent  on  free  carrier  screening,  this  issue  is  critical  in  the 
understanding  of  the  role  of  these  scattering  mechanisms.  The  current 
consensus  is  that  static  screening  is  generally  an  inadequate  approximation. 
A  number  of  approximations,  such  as  plasmon-pole  are  available,  to 
approximate  the  full  dielectric  function,  but  these  assume  quasi-equilibrium 
conditions,  which  is  not  generally  present  on  the  femtosecond  timescale  in 
these  experiments.  Progress  in  developing  more  general  methods  has  been 
made  through  a  joint  ensemble  Monte  Carlo /molecular  dynamics  approach 
[7],  which  has  some  success  in  correlations  with  measured  data.  However,  this 
method  appears  to  be  limited  to  homogenous  systems  due  to  limitations 
arising  from  the  size  of  the  area  that  can  be  simulated,  and  the  method  does 
not  naturally  provide  the  free  carrier  dielectric  function  required  for 
calculating  the  effect  of  screening  on  other  scattering  processes  and  Coulomb 
correlation  effects.  Thus  it  is  unlikely  that  this  method  can  be  applied  more 
widely  to  the  modeling  of  other  highly  non-equilibrium  phenomena,  which 
are  important  in  describing  high  speed  compound  semiconduc.or  devices. 
Even  more  recently  an  extension  of  the  standard  Monte  Carlo  method  that 
includes  the  free  carrier  polarization  has  been  developed.  However,  this 
approach  has  to  be  classified  as  a  radical  departure  from  standard  Monte 
Carlo,  and  hence  it  is  not  clear  whether  it  retains  the  many  features  that  make 
Monte  Carlo  widely  used  [8]. 

Band  gap  renormalization  and  Coulomb  enhancement  are  processes  that 
have  received  relatively  little  attention  in  theoretical  studies.  Most  detailed 
simulations  have  completely  neglected  these  effects  [1,2,7]  The  few  efforts  that 
have  included  them  typically  use  simple  qualitative  models.  Nonetheless,  a 
number  of  experiments  have  shown  effects  attributed  to  these  processes  [9-12]. 
Since  these  processes  are  known  to  decrease  in  importance  away  the  band 
edge,  experiments  far  from  the  band  edge  can  perhaps  be  understood  without 
them.  However,  their  effect  in  the  near  bandgap  regime,  where  the  present 
femtosecond  probing  experiments  are  performed,  is  expected  to  be  significant, 
and  thus  careful  investigations  must  be  launched. 

The  role  of  band  renormalization  and  Coulomb  enhancement  effects  in  the 
relaxation  of  near  band  gap  carriers  on  the  femtosecond  timescale 
femtosecond  has  been  the  main  focus  in  this  task  during  the  past  year.  No 
major  new  issues  have  hence  been  explored  in  the  non-equilibrium 
transport  area. 

PROGRESS 

A  dual  carrier  ensemble  Monte  Carlo  program  (OPTMC)  capable  of 
simulating  near  band  gap  femtosecond  optical  pump-probe  experiments  in 
Ino.53Gao.47As  thin  films  on  transparent  InP  substrates  has  been  developed. 
This  formulation  describes  all  scattering  rates,  including  carrier-carrier,  and 
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an  acoirate  k.p  band  structure  for  both  electrons  and  holes.  Carrier  transport 
along  the  normal  to  the  thin  film  is  included  through  a  self-consistently 
calculated  electric  field.  Screening  was  initially  handled  self-consistently 
through  the  long  wavelength  limit  of  the  Lindhard  dielectric  function. 
Optical  absorbtion  was  calculated  using  the  instantaneous  carrier  distribution 
function.  This  simultation  code  has  been  used  in  collaboration  with 
femtosecond  laser  probing  experiments  performed  by  C.  Pollock’s  group  in 
task  #3  of  this  JSEP  program  to  explore  the  femtosecond  dynamics  of  carries 
excited  within  100  meV  of  the  band  gap. 

A  great  a  deal  of  effort  has  been  expended,  both  experimentally  and 
theoretically,  in  this  area  in  recent  years.  However,  our  effort  is  in  many  ways 
unique.  Most  other  groups  have  utilized  the  capabilities  of  femtosecond  CPM 
lasers  in  the  AlGaAs/GaAs  system.  The  fact  that  this  is  not  a  tunable  laser 
and  has  a  fixed  photon  energy  of  about  2  eV  restricts  excitation  of  carriers  to 
states  very  high  in  the  band.  Carrier  relaxation  dynamics  in  this  energy  range 
are  dominated  by  intervalley  transfer  and  hence  most  of  the  work  has  focused 
on  determining  these  rates.  In  the  present  work  the  availability  of  the  unique 
tunable  femtosecond  color-center  laser  developed  by  C.  Pollock  has  been 
exploited.  This  laser  enables  Ino.53Gao.47As  thin  films  to  be  excited  over  a 
range  from  the  band  edge  to  100  meV  above  it.  Thus  the  complications  of 
intervalley  transfer  are  avoided  and  a  very  different  regime  of  carrier 
dynamics  can  be  explored. 

Due  to  the  critical  role  of  screening  in  the  formulation,  it  was  concluded  that 
the  consequences  of  the  approximations  made  in  the  static  screening  model 
needed  to  be  examined  in  greater  detail  in  this  femtosecond  domain.  To  this 
end  a  new  more  accurate  dynamic  screening  model  has  been  developed  and 
implemented  in  the  Monte  Carlo  code.  The  new  model  is  derived  from  the 
Lindhard  dielectric  function  and  fully  incorporates  the  energy  and 
wavelength  dependence  of  free  carrier  screening.  The  use  of  an  approximate 
parabolic  band  structure  and  the  neglect  of  the  anisotropy  in  the  bands, 
electron  and  hole  carrier  distributions,  and  the  dielectric  function  were  the 
only  simplifications  made.  Comparison  of  the  dielectric  functions  derived 
from  the  two  models  resulted  in  a  number  of  critical  differences.  With 
dynamic  screening  the  dielectric  function  falls  of  rapidly  with  energy 
resulting  in  significantly  reduced  screening  for  inelastic  scattering  processes. 
The  dynamic  dielectric  function  also  falls  off  faster  at  small  wavelengths 
resulting  from  a  more  accurate  inclusion  of  the  finite  size  of  the  carrier  wave 
packets  compared  to  the  static  long  wavelength  approach.  The  plasma 
resonance  at  high  energy  and  long  wavelength  is  also  properly  included  in 
the  dynamic  model  resulting  in  the  enhancement  of  some  high  energy 
carrier-carrier  interactions.  Further,  at  early  times  in  the  simulations  (within 
100  fs  of  the  center  of  the  exciting  pulse)  the  extremely  non-equilibrium 
distribution  function  results  in  an  unexpected  plasma  resonance  with  the 
heavy  holes  taking  the  form  of  a  narrow  ridge  running  from  small  energies 
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and  large  wavelengths  to  large  energies  and  small  wavelengths  diagonally 
across  the  dielectric  function.  We  have  found  that  the  dielectric  response  of 
the  holes  is  not  fully  integrated  into  the  total  dielectric  function  and  as  long  as 
the  holes  are  constrained  to  this  narrow  energy  region  the  plasma  will  exhibit 
a  second  plasmon  spectrum  due  to  the  heavy-holes.  This  results  from  the  fact 
that  the  heavy  holes  are  excited  into  an  extremely  small  energy  range  initially 
(2-3  meV).  As  the  heavy  hole  distribution  relaxes  toward  equilibrium  the 
resonance  shrinks  in  size  and  eventually  disappears  as  the  holes  are  dispersed 
throughout  the  band.  The  significance  of  this  effect  for  the  simulations 
discussed  here  is  that  the  resonance  crosses  a  region  of  momentum-energy 
space  important  two  a  number  of  carrier-carrier  scattering  processes  and 
results  in  a  signiticant  enhancement  of  these  processes  at  ultrashort  times. 

The  new  dynamic  screening  model  resulted  in  dramatically  faster  relaxation 
times  when  compared  to  the  static  screening  model.  This  is  primarily  the 
result  of  two  effects.  Within  this  formulation  scattering  processes  involving 
optical  phonons  are  largely  unscreened  because  of  the  large  energy  and 
momentum  transfers  involved.  Thus  the  resulting  electron-optical  phonon 
scattering  rates  are  nearly  doubled,  while  heavy  hole-optical  phonon  rates 
show  only  a  small  increase  because  of  the  extremely  large  transfers  of 
momentum  involved.  As  anticipated  the  carrier-carrier  scattering  rates  were 
greatly  enhanced  within  the  new  screening  model.  At  longer  times  (beyond 
100  fs)  when  the  carrier  distribution  functions  are  approaching  an 
equilibrium  distribution,  the  electron-electron  scattering  rates  are  found  to  be 
approximately  doubled  while  the  heavy  hole-heavy  hole  rate  shows  a  10-15% 
increase.  The  effect  on  heavy  hole-heavy  hole  scattering  is  weaker,  because  it 
results  in  smaller  energy  transfers  but  large  momentum  transfers.  This  heavy 
hole  scattering  rate  benefits  less  from  the  energy  dependence  of  the  dynamic 
dielectric  function  and  at  the  same  time  is  more  weakly  screened  initially. 
Electron-hole  scattering  is  negligibly  affected  because  it  is  approximately 
elastic.  At  early  times,  interaction  with  the  heavy-hole  resonance  results  in  a 
spectacular  enhancement  in  all  carrier-carrier  scattering  rates  except  the  heavy 
hole-heavy  hole  interaction.  Thus  we  find  that  the  use  dynamic  screening 
results  in  a  large  increase  in  the  magnitude  of  the  dominant  scattering  rates,  a 
finding  important  for  many  non-equilibrium  transport  conditions.  Here  the 
effect  of  dynamic  screening  is  reflected  in  the  decrease  of  the  relaxation  time 
extracted  from  the  simulated  probe  transmission,  from  200  to  145  fs  for  a 
pulse  consisting  of  810  meV  photons  with  an  intensity  of  5.0  x  10^3  eV/cm^. 
This  exhibits  itself  in  the  distribution  functions  as  an  extremely  rapid 
appearance  of  carriers  at  the  band  edge  and  an  extremely  rapid  washing  out  of 
the  excitation  peak  structure  as  well  as  the  elimination  of  distinct  phonon 
replicas.  Opticd  phonon  scattering  is  still  an  important  scattering  process,  but 
carrier-carrier  scattering  has  increased  greatly  in  its  relative  importance  and  is 
now  found  to  contribute  significantly  to  the  total  scattering  times.  Further 
details  of  this  effect  are  discussed  in  JSEP  publication  [3]. 
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Investigations  have  been  completed  examining  the  effect  of  the  energy  of  the 
exciting  photons  and  the  intensity  of  the  exciting  pulse  and  comparing  these 
to  the  results  from  static  screening.  In  both  cases  the  results  are  qualitatively 
similar  to  those  for  static  screening  although  with  significantly  reduced 
relaxation  times.  The  extracted  relaxation  times  still  decrease  with  increased 
photon  energy,  however,  the  distinct  shoulder  at  the  first  phonon  threshold 
has  disappeared.  This  is  indicates  the  increased  importance  of  carrier-carrier 
scattering,  which  serves  to  wash  out  sharp  features.  The  relaxation  times  are 
still  found  to  increase  with  pulse  intensity  although  to  a  lesser  degree.  In  the 
dynamic  model  enhanced  screening  with  increased  carrier  density  plays  only 
a  small  role  with  the  dominant  effect  being  the  degeneracy  in  the  conduction 
band,  which  reduces  electron  scattering  rates  and  consequently  increases 
relaxation  times. 

New  attempts  to  correlate  with  recent  measured  data  from  the  group  of  C. 
Pollock  have  been  undertaken  with  the  new  dynamic  screening  model. 
Significant  gains  in  describing  carrier  behavior  have  been  made  with 
relaxation  times  derived  from  the  dynamic  model.  The  resulting  simulated 
results  are  much  closer  to  those  determined  from  measured  data  than  the 
those  extracted  from  the  static  screening  model.  Still,  the  differences  are  larger 
than  could  be  removed  by  tuning  the  model  parameters  and  research  has 
been  continued  to  determine  what  significant  physical  process  is  responsible. 

Consequently  two  additional  physical  processes,  band  renormalization  and 
the  Coulomb  enhancement  of  the  optical  matrix  elements,  were  investigated. 
Band  renormalization  has  been  included  within  the  quasi-static 
approximation  of  Haug  and  Schmitt-Rink  [13].  This  formulation  allows  the 
renormalization  for  both  the  valence  and  conduction  bands  to  be  calculated 
from  the  carrier  distribution  functions  and  free  carrier  dielectric  function.  As 
implemented  in  OPTMC  the  renormalizations  are  calculated  at  each  ti-  j  step 
self-consistently  using  the  instantaneous  values  of  these  functions  available 
within  the  simulation.  The  result  is  then  included  in  the  simulation  through 
a  rigid  band  shift  for  each  band.  The  conduction  and  valence  bands  are  each 
shifted  in  energy  by  an  amount  calculated  at  their  respective  band  edges  (G 
point).  The  effect  of  the  band  gap  shrinkage  is  fully  included  in  the  model  of 
optical  interactions.  Both  the  shift  in  energy  of  carriers  excited  by  photons  of  a 
given  energy  and  the  decrease  in  the  photon  absorption  length  are  included. 
The  variation  in  the  position  of  the  bands  across  the  thickness  of  the  sample 
arising  from  variations  in  the  carrier  density  are  included  within  the  same 
interpolation  scheme  used  for  the  carrier  distribution  functions  and  dielectric 
function.  Transport  effects  arising  from  this  spatial  dependence  are  dealt  with 
by  including  the  renormalization  energy  for  the  band  as  an  effective  potential 
in  the  semi-classical  equations  of  motions  for  the  carriers  in  that  band. 

The  effect  of  band  renormalization  on  the  simulated  results  was  found  to  be 
quite  dramatic,  and  for  the  range  of  photon  energies  of  interest  here  (20-60 
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meV  above  the  band)  of  even  greater  importance  than  dynamic  screening. 
For  example,  simulated  pulse-probe  experiments  performed  assuming  an 
excitation  pulse  energy  1.9  x  10^3  eV/cm^  and  a  photon  energy  of  0.87  eV  (  37 
meV  above  the  band  show  a  199  fs  effective  relaxation  time  over  the  range 
200-800  fs  of  the  delay  between  the  pulse  and  probe  pulse  for  a  simple  model 
with  only  static  screening.  With  dynamic  screening  the  result  is  169  fs,  while 
for  static  screening  with  band  renormalization,  and  dynamic  screening  with 
band  renormalization  the  results  are  139  and  100  fs  respectively.  Thus  the 
band  renormalization  is  the  larger  of  the  two  effects.  These  results  can  be 
compared  to  the  experimental  result  for  a  dual-pulse-correlation  experiment 
of  116  fs.  In  this  case  there  is  a  reduction  in  the  band  gap  of  approximately  13 
meV,  which  is  a  significemt  portion  of  the  initial  excess  energy  of  37  meV,  and 
results  in  a  significant  change  in  the  relationship  between  the  pump  pulse 
and  the  excited  carrier  distribution.  Not  surprisingly  band  renormalization 
takes  on  increasing  importance  when  the  band  edge  is  approached.  Details  of 
these  results  are  available  in  JSEP  publications  [3, 4]. 

A  detailed  comparison  of  the  results  of  this  model  with  the  experiments  of  C. 
Pollock’s  group  indicates  that  the  simulation  now  reproduces  measured 
optical  transmission  data  as  a  function  of  probe  delay  fairly  accurately.  The 
initial  transients  are  well  reproduced  with  effective  relaxation  times  agreeing 
with  experiment  within  10-15%  without  any  adjustable  parameters.  It  seems 
clear  that  this  model  contains  the  physics  necessary  to  provide  a  fundamental 
understanding  of  these  experiments.  However  some  details  still  need  to  be 
addressed  for  a  full  quantitative  understanding.  The  simulation  does  not 
generally  reproduce  the  long  time  measured  transients  (larger  than  600  fs) 
accurately  and  as  a  result  does  not  describe  the  small  variations  seen  as  a 
function  of  photon  energy  and  excitation  pulse  intensity,  which  are  largely  as 
result  of  changes  at  these  late  times.  It  should  be  emphasized,  however,  that 
the  remaining  problems  are  small  details  and  that  the  basic  response  is  well 
reproduced.  We  believe  that  this  model  provides  an  adequate  basis  for  the 
fundamental  understanding  and  analysis  of  these  femtosecond  experiments. 

Some  of  the  remaining  discrepancy  is  likely  the  result  of  significant 
uncertainty  in  some  of  the  materials  parameters  and  can  likely  be  removed  by 
fine  tuning  the  model.  However,  it  has  become  clear  that  full  agreement  with 
experiment  cannot  be  obtained  by  tuning  within  the  range  of  material 
parameters,  which  is  physically  reasonable.  Preliminary  results  do  indicate 
that  much  of  the  remaining  discrepancy  can  be  addressed  through  the 
inclusion  of  the  Coulomb  enhancement  of  the  optical  transition  elements.  In 
this  process  the  Coulomb  interaction  between  the  excited  electron-hole  pair 
enhances  the  effective  transition  rate.  This  process  becomes  increasingly 
important  as  the  band  edge  is  approached  and  can  be  included  as  a 
multiplicative  factor  in  the  optical  transition  matrix  element.  It  has  been 
implemented  within  OPTMC  through  a  phenomenological  formulation 
developed  by  Banyai  and  Koch  [14].  This  formulation  allows  the  inclusion  of 
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the  effects  of  free  carrier  screening  on  the  enhancement  and  is  implemented 
self-consistently  within  the  Monte  Carlo  method.  Early  results  indicate  that 
this  improvement  will  significantly  reduce  remaining  discrepancies  with 
experiment.  Specifically  the  accuracy  of  the  long  time  transients  has  been 
improved  and  correct  trends  as  a  function  of  pulse  intensity  have  been 
restored.  These  results  will  be  discussed  in  detail  in  JSEP  publication  [5]. 

SaENTinC  IMPACT  OF  RESEARCH 

Free  carrier  screening  and  band  renormalization  have  been  shown  to  be 
critical  processes  (beyond  the  usual  scattering  mechanisms)  for 
understanding  carrier  relaxation  on  the  femtosecond  timescale  in  the  near 
bandgap  regime.  We  think  that  the  Coulomb  enhancement  may  also  be  an 
important  process  in  this  regime.  In  order  to  quantitatively  investigate  the 
effect  of  these  processes  new  and  unique  techniques  have  been  developed  to 
allow  there  inclusion  in  the  standard  Monte  Carlo  method  with  a  minimum 
of  assumptions.  Static  screening  appears  inadequate  in  the  modeling 
femtosecond  carrier  relaxation.  The  inclusion  of  dynamic  screening  provides 
a  much  more  accurate  understanding  of  the  microscopic  processes  involved 
and  the  increased  importance  of  carrier-carrier  scattering,  when  combined 
with  dynamic  screening. 

Based  on  our  findings  it  appears  that  the  effect  of  carrier-carrier  scattering 
should  be  reassessed  in  other  situations  where  highly  non-equilibrium 
distribution  functions  are  involved.  Such  conditions  are  found  in  state  of  the 
art  high  speed  heterojunction  devices.  Our  results  further  suggest  that  band 
renormalization  may  play  a  non-negligible  role  in  many  semiconductor 
devices,  in  which  it  has  not  yet  been  considered.  Finally  in  our  work  on  the 
simulation  of  femtosecond  optical  experiments  we  have  developed  the 
formulations  and  numerical  methods,  which  are  required  to  quantitatively 
investigate  optoelectronic  devices,  such  as  light  detectors  and  sources,  on  the 
first  principles  level.  This  is  the  direction  we  have  proposed  to  take  in  the 
continuation  task  within  the  new  JSEP  program. 
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OBTECnVE 

Adaptive  computational  schemes  are  typical  in  real-time  signal  processing 
where  the  data  collected  from  the  environment  rapidly  changes  in  time.  A 
basic  step  in  adaptive  processing  is  to  discard  a  portion  of  the  "old"  data  which 
no  longer  represents  the  environment,  add  new  data,  and  then  "adapt"  the 
current  knowledge  about  the  environment  according  to  the  change  in  the 
data.  Such  processing  arises  for  example  in  sensor  array  processing  [1].  Sensor 
array  processing  is  characterized  by  the  need  to  implement  multiple 
algorithms  (i.e.,  weight  vector  computation,  and  application,  eigenanalysis, 
residual  evaluation,  etc.).  Serial  implementation  of  such  multiple  tasks  most 
often  can  be  readily  done  via  standard  numerical  schemes:  the  challenge  in 
multiple  tasking  lies  in  devising  efficient  parallel  implementations  which  are 
needed  in  real-time  analysis  of  data  derived  from  an  array  of  sensors.  Our 
three  major  objectives  are;  (i)  development  of  strategies  for  adding  and 
deleting  information  from  the  covariance  matrix  in  multi-direction 
beamforming  (least  squares),  (ii)  development  of  strategies  for  tracking  the 
eigenstructure  of  the  array  data  after  addition  and  deletion  of  data  (covariance 
differencing),  (iii)  evaluation  of  procedures  in  (i)  and  (ii)  on  emerging  parallel 
processor  architectures. 

DISCUSSION  OF  STATE-OF-TKE-ART 

Least  squares  problems  are  ubiquitous  in  engineering,  science,  operations 
research,  etc.  The  linear  least  squares  problem  can  be  posed  as  follows: 

Given  an  nxp  matrix  A,p<n,  and  a  vector  b,  find  the  vector  x  which  solves 

min(-Ax  -  b)t  (i^x  -  b)  \ 

(x) 

Here  t  denotes  conjugate  transposition.  As  long  as  the  data  matrix  A  has  full 
rank  the  solution  is  given  by  the  solution  to  the  corresponding  normal 
equations. 


X  =  (AM)'^  ATh 


(2) 


36 


The  quantity  K  =  A^A  is  referred  to  in  the  signal  processing  literature  as  the 
covariance  matrix,  and  x  is  referred  to  as  the  weight  v^ector. 

In  applications  various  constraints  are  imposed  on  the  weight  vector  x. 
Typical  constraints  are  linear  equality  constraints,  linear  inequality 
constraints,  or  quadratic  constraints. 

Because  K  has  twice  the  dynamic  range  as  the  data  matrix  A,  it  requires  about 
twice  the  wordlength  to  work  with  on  a  computer,  for  a  given  problem. 
Therefore  it  is  often  advisable  to  not  form  K  explicitly.  The  method  of  choice 
for  solving  full  rank  least  squares  equations  is  to  proceed  by  a  unitary 
transformation  Q  that  "compresses"  the  data  matrix  A  to  the  "information 
equivalent"  triangular  matrix  U.  This  triangular  matrix  is  known  as  a 
Cholesky  factor  of  K.  The  desired  least  squares  solution  is  next  determined  by 
solving  the  corresponding  triangular  system  of  linear  equations. 

In  recursive  least  squares  equation  (1)  needs  to  be  solved  repeatedly  after 
some  rows  of  A  are  removed  and  additional  rows  are  added.  The  addition 
and  the  deletion  are  known  as  updating  and  downdating  the  Cholesky  factor, 
respectively,  or  simply  as  a  modification  of  the  Cholesky  factor.  The  reasons 
for  updating  and  downdating  vary.  Perhaps  the  data  to  be  deleted  is 
linrepresentative  of  the  data  at  large  and  so  its  effects  on  the  weight  vector  (or 
parameter  estimate)  x  must  be  excised  (robust  statistics).  Or  perhaps  the  data 
is  changing  with  time  and  old  data  must  be  deleted  (adaptive  beamforming). 

The  combined  process  of  updating  and  downdating  the  Cholesky  factor  is 
called  a  sliding  rectangular  window  process,  and  is  the  focus  of  this  work.  An 
alternative  to  a  sliding  window  is  an  exponential  weighting,  where  all  past 
data  is  employed  in  the  least  squares  estimate,  albeit  with  an  increasingly 
smaller  weighting,  controllable  by  the  decay  rate  of  the  window  [2],  Single 
vector  updating  for  exponential  weighting,  and  an  efficient  parallel 
implementation  of  the  process  is  described  in  [3].  In  this  work  we  are 
concerned  with  the  (slowly  varying)  nonstationary  case  which  necessitates 
multiple  vector  addition/ deletion  from  the  data  matrix. 

Processing  of  recursive  least  squares  problems  on  a  sequential  machines  is 
now  well  understood.  It  is  known  that  while  the  updating  process  is 
numerically  sound,  the  downdating  can  be  very  sensitive  to  rounding  errors 
[4].  The  most  numerically  accurate  downdating  methods  requires  formation 
of  Q  and  downdating  Q  itself  ([5]).  This  however  results  in  quite  high 
computational  cost,  and  additional  memory  requirements  for  storing  Q.  One 
scheme  that  avoids  computing  Q  is  a  single  vector  downdating  scheme  based 
on  plane  rotations  [4].  Another  way  of  downdating  the  Cholesky  factor  is  the 
so-called  Stabilized  Hyperbolic  Householder  scheme  [6]  (see  also  [7])  which  is 
half  as  expensive  (for  multiple  vector  updating/downdating  problems),  in 
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terms  of  number  of  operations,  as  the  method  described  in  (4],  and  hence  the 
preferable  method  for  sequential  rectangular  sliding  window  process. 

In  parallel  computing  the  additional  cost  of  interprocessor  communication 
has  to  be  taken  into  account  in  assessing  the  cost  of  executing  algorithms. 
Most  discussions  surrounding  multiprocessor  computers  for  signal 
processing  have  centered  on  planar  (triangular)  arrays  [8],  [9],  (3),  [10],  [11],  [12]. 
Perhaps  the  sole  exception  has  been  the  important  contribution  by  Rader  in 
[13].  Both  triangular  and  linear  arrays  considered  in  [10]  or  [13]  are  designed  to 
implement  efficiently  the  exponential  weighting  method.  The  exponential 
weighting  method  is  very  attractive  for  parallel  implementation  as  it  can  be 
realized  by  a  single  updating  process.  On  the  other  hand,  the  sliding  window 
process  is  a  composite  tasks  in  the  sense  that  each  recursive  step  involves 
updating  and  downdating  of  the  triangular  factor  followed  by  solving  the 
resulting  triangular  systems  of  linear  equations.  None  of  the  architectures 
proposed  in  [3]  or  [13]  can  efficiently  deal  with  the  sliding  window  process 
described  above. 

This  work  is  interested  in  updating  and  downdating  methods  that  can  add  or 
remove  an  arbitrary  number  of  rows  at  a  time.  The  advantage  of  such 
methods  over  repetitive  applications  of  single  row  updating  and  downdating 
is  that  they  make  use  of  matrix-vector  and  matrix-matrix  operations,  as 
opposed  to  mainly  vector-vector  operations  for  single  row  methods.  This 
may  make  such  block  methods  faster  on  processors  with  caches  and  parallel 
computers. 

PROGRESS 

We  have  produced  algorithms  that  can  update  and  downdate  an  arbitrary 
number  of  rows  from  the  triangular  factor  corresponding  to  the  least  squares 
problem.  Below  we  describe  the  method  of  removing  an  arbitrary  number  of 
rows  from  the  data  (downdating).  Updating  (adding  rows  to  the  data)  can  be 
derived  in  a  similar  way. 

Assume  we  wish  to  remove  the  first  k  rows,  (Z  d),  of  the  data  (A  b).  This 
would  be  a  rank-k  downdating  of  the  linear  least  squares  problem.  To 
accomplish  this,  we  must  first  examine  the  QR  factorization  of  an  augmented 
problem, 

[(z  d) 

U  t) 

where  R  is  the  upper  triangular  factor  of  the  data  matrix  (A  b).  Note  that 
from  this  equation 
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implying  that  the  first  k  rows  of  the  orthogonal  columns  are  orthogonal  as 
well. 


Now  let  H  be  an  orthogonal  matrix  such  that 
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where  U  is  upper  triangular.  U  is  the  triangular  factor  corresponding  to  the 
data  (A  b ).  Then  we  solve  U  x  =  u,  and  we  are  done.  The  two-norm  of  the 

V 

residual  corresponding  to  the  downdated  problem  is  P . 


The  relation  (3)  is  fundamental  to  all  rank-/:  downdating  methods  we  have 
developed.  The  major  quantity  of  interest  is  (Q  n  Qa)  and  the  downdating 
methods  will  differ  in  how  (Q  qD  is  computed. 


Methods  that  maintain  Q  and  R  or  just  R  throughout  a  recursive  least  squares 
problem  are  described  below: 

Classical  Gram- Schmidt  on  an  augmented  problem.  The  first  method  is  to  use 
classical  Gram-Schmidt  with  reorthogonalization  (CGS)  to  build  on  the 
orthonormal  columns  already  given.  Equation  (3)  represents  a  QR 
factorization,  which  could  have  been  accomplished  by  classical  Gram- 
Schmidt.  Since  we  have  Qn,  Q21,  and  R  already,  we  have  completed  n 
iterations  (Note  that  the  ith  iteration  of  classical  Gram-Schmidt  produces  the 
Jth  column  of  Q  and  the  ith  column  of  R).  We  may  then  proceed  with  the 
remaining  k  iterations  of  the  orthogonalization  process  to  get  the  new 
orthogonal  columns. 


Modified  Gram-Schmidt  on  an  augmented  problem.  Modified  Gram-Schmidt 
(MGS)  can  also  be  used  to  get  the  new  orthonormal  columns.  The  ith 
iteration  of  MGS  praduces  the  ith  column  of  Q  and  the  ith  row  of  R  in  (3)  and 
updates  the  columns  of  Q  to  be  formed  in  later  iterations.  Again  we  use  the 
fact  that  (3)  represents  a  QR  factorization  that  is  partially  completed.  After 
n  +1  iterations  of  MGS,  we  have  the  following: 
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The  only  term  in  the  above  that  we  do  not  have  is  =[TJ  We  can  find 
T  by  performing  only  the  update  portion  of  MGS  (i.e.,  T  =T  -  q^TT)  for  each 
of  the  n  +  1  orthogonal  columns  from  ®  ascending 

order  on  the  new  k  columns  of  T. 

Using  Corrected  Semi-Normal  Equations.  This  method  is  one  of  the 
numerically  best  that  only  maintains  R  and  not  Q  (but  uses  the  original  data 
{A  b)).  The  basis  of  this  method  is  in  corrected  semi-normal  equations  [14]. 

The  method  computes  T  in  (6)  by  first  solving  the  norma!  equations 
corresponding  to  the  linear  least  squares  problem.  The  accuracy  of  T  is  next 
improved  by  the  technique  of  iterative  refinement.  Finally,  the  QR 
decomposition  of  T  produces  the  desired  submatrix  Q12. 

Numerical  Results 


The  following  two  figures  display  the  relative  error  of  solutions  for  each  of 
the  three  methods  described.  Both  tests  represent  a  window  of  size  8  sliding 
down  a  50  X  5  matrix  three  rows  at  a  time  (i.e.,  three  rows  are  updated  and 
three  rows  are  downdated  each  time  the  window  "slides"  down).  The  first 
matrix  tested  in  Figure  1  is  well-conditioned  except  for  a  large  element  at 
entry  (18,3).  The  second  matrix  tested  in  Figure  2  is  an  overall  ill-conditioned 
matrix,  especially  in  the  central  rows.  Each  method  is  compared  against  a  full 
QR  decomposition  method,  and  a  method  extended  from  LINPACK.  The  full 
QR  method  should  result  in  the  best  possible  accuracy  since  it  has  complete 
information,  but  has  the  resulting  disadvantage  of  being  slow  from 
maintaining  all  of  its  information.  The  LINPACK  method  is  an  example  of  a 
"maintain  R  only"  method  that  is  well  known  and  commonly  used. 

Figures  1  and  2  illustrate  that  the  methods  developed  in  this  work  achieve 
the  numerical  accuracy  of  a  full  QR  decomposition  for  numerically  "bad" 
sliding  window  scenarios  that  commonly  occur  in  practice.  However,  the 
methods  presented  involve  less  work  than  a  full  QR  decomposition  method. 
Note  also  that  our  newly  developed  methods  perform  significantly  better 
than  the  established  method  of  LINPACK. 

Parallel  results 

Distributed  memory  parallel  architectures  are  well  suited  for  matrix 
processing  as  application  and  machine  topology  are  often  well  matched.  The 
critical  aspect  in  the  performance  of  matrix  computations  on  such 
architectures  is  the  distribution  of  the  matrix  data  among  the  processors. 
Standard  matrix  data  distributions  are  by  rows,  columns,  or  subblocks,  either 
block-  or  wrap-distributed  to  the  processors.  The  choice  of  data  distribution 
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affects  communication  requirements,  and  the  load  balance  among  the 
processors. 


RLS  Rank-}  Modi ficac ions:  Spurious  Eiemont  Test 


Figure  1.  Sliding  Window  Test  -  Spurious  Element. 

RLS  Rank-3  Modi f Icacions:  Varying  Ill-Conditioned  Data 


Figure  2.  Sliding  Window  Test  -  Vaiying  Ill-Conditioned  Data. 

In  light  of  this,  we  have  examined  experimentally  the  parallel  properties  of 
two  of  the  fundamental  algorithms  used  in  recursive  least  squares:  back 
substitution  and  Givens  rotation  sequences. 

Figure  3  shows  the  back  substitution  algorithm  and  Figure  4  shows  a  typical 
Givens  rotation  sequence  needed  in  recursive  least  squares  (for  forming  H  in 
(5))  running  on  an  Intel  iPSC/860  Hypercube  for  various  matrix  data 
distributions.  Notice  how  the  distribution  choice  greatly  affects  execution 
time.  Also  note  that  our  newly  developed  block  distribution 
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implementations  perform  better  than  published  wrap  implementations  [151 
(denoted  by  "L<Z89"),  experimentally  supporting  the  theory  that  block-oriented 
algorithms  are  preferable  in  parallel  environments. 


Back  Subscitucien  Algorithms:  Various  Distributions  on  a  16  node  iPSC/860  Hypercube 


Figure  3.  Comparison  of  Back  Substitution  Parallel  Implementations. 


Givens  Rotation  Sequence:  Various  Distributions  on  a  16  node  iPSC/860  Hypercube 


Figure  4.  Comparison  of  Givens  rotation  sequence  parallel  implementations. 


SaENTinC  IMPACT  OF  RESEARCH 


The  research  concluded  so  far  concentrated  mainly  on  recursive  windowed 
linear  least  squares  problems  arriving  in  real-time  DSP  systems  derived  from 
the  covariance  differencing  principle.  The  contribution  of  this  research  is 
twofold.  First,  new  highly  concurrent  algorithms  amenable  to  efficient 
parallel  implementation  were  developed  and  their  numerical  properties 
analyzed.  Second,  tests  have  been  conducted  indicating  the  feasibility  of  a 
block  approach  to  recursive  linear  least  squares  problems.  What  remains  to  be 
done  is  the  formulation  of  parametric  models  that  can  predict  the  best 
numerical  and  parallel  implementation  for  recursive  least  squares  problems 
based  on  the  specified  requirements  of  the  user. 


DEGREES  AWARDEE 


None 
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FAULT  TOLERANT  BEAMFORMING  ALGORITHMS 


Task  #6 

Task  Principal  Investigaton  Franklin  T.  Luk 

(518)  276-  8326 


OBTECnVE 

Multidimensional  signal  processing  in  the  context  of  processing  signals 
received  by  an  array  of  sensors  has  many  important  applications.  The  type  of 
filtering  that  can  be  conveniently  applied  to  signals  carried  by  propagating 
waves  is  beamforming,  which  seeks  to  isolate  signal  components  that  are 
propagating  in  a  particular  direction.  Although  computationally  expensive, 
the  beamforming  procedure  has  been  rapidly  rising  in  popularity  due  to 
advances  in  both  matrix  algorithms  and  systolic  arrays.  Most  systolic  arrays 
will  be  deployed  in  harsh  environments  and  are  thus  susceptible  to  frequent 
transient  errors.  The  principal  objective  of  this  task  is  to  develop  systolic  fault 
tolerant  beamforming  techniques.  Special  attention  will  be  paid  to 
computing  complex  matrix  decompositions,  avoiding  numerical  overflows, 
differentiating  between  errors  arising  from  numerical  roundoff  buildups  and 
those  from  hardware  failures,  and  interrupting  the  operation  of  systolic  arrays 
for  error  correction. 

DISCUSSION  OF  STATE  OF  THE  ART 

The  compatibility  of  systolic  arrays  and  algorithms  with  both  matrix 
computations  and  today’s  VLSI  technology  guarantees  their  future  use  as  key 
components  in  any  signal  processing  system.  An  especially  important  systolic 
algorithm  is  the  orthogonal  triangulation  algorithm  (QR  decomposition)  for 
least  squares  minimization,  a  crucial  step  in  most  adaptive  antenna 
processing  algorithms.  The  importance  of  these  problems  is  evidenced  by  two 
major  systolic  array  projects;  one  at  MIT's  Lincoln  Laboratory  [1]  and  the  other 
at  the  United  Kingdom's  Royal  Signals  and  Radar  Establishment  (RSRE)  [2]. 
However,  traditional  fault  tolerance  techniques  such  as  modular  redundancy 
have  been  regarded  as  too  costly  and  unwieldy  to  implement  on  these  systolic 
arrays.  In  [3],  a  JSEP  supported  work,  we  presented  a  simple  fault  tolerance 
scheme  for  the  QR  decomposition  and  showed  how  it  can  be  easily 
incorporated  into  the  RSRE  systolic  arrays  for  recursive  least  squares 
minimization.  Our  work  has  already  won  recognition  at  the  RSRE  as  a 
possible  fault  tolerance  technique  for  their  systolic  arrays  [4]. 

Data  matrices  that  are  ill-conditioned  call  for  a  more  robust  and  more 
expensive  numerical  technique  known  as  the  singular  value  decomposition 
(SVD).  An  SVD  systolic  array  designed  by  us  has  been  adopted  for  hardware 
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implementation  at  both  the  RSRE  [5]  and  Computational  Engineering,  Inc.  [6]. 
The  implementation  of  the  latter  will  be  used  for  real  time  system  control;  its 
application  in  the  wing  flutter  analysis  of  supersonic  planes  has  been  proven 
in  a  wind  tunnel  test  at  an  Air  Force  Laboratory  in  Ohio.  The  problem  of 
fault  tolerant  computation  of  the  singular  value  decomposition  awaits  a  nice 
solution.  Schemes  were  reported  in  [7],  but  they  are  so  complicated  that  triple 
modular  redundancy  may  well  be  a  better  choice. 

Existing  fault  tolerance  schemes  have  often  been  ignored  by  systolic  array 
designers  because  they  are  too  costly  and  unwieldy  to  implement.  An 
attractive  new  idea  came  in  the  form  of  algorithm-based  fault  tolerance.  This 
approach  employs  three  steps;  encode  the  input  data,  execute  the  algorithm 
on  the  encoded  input  to  produce  encoded  output,  and  decode  the  output  to 
detect  and  perhaps  correct  errors.  Both  checksum  and  weighted  checksum 
encoding  schemes  have  been  developed  by  Abraham  et  al.  [8,  9],  who  showed 
that  a  variety  of  matrix  operations  preserves  the  checksum  property. 

In  [9,  10]  a  linear  algebraic  interpretation  of  the  weighted  checksum  scheme 
was  proposed.  Such  a  model  allows  parallels  to  be  drawn  between  algorithm- 
based  fault  tolerance  and  coding  theory,  and  makes  it  possible  to  examine  in 
detail  the  difficulties  in  choosing  weight  vectors  such  that  the  correction 
vector  can  be  explicitly  resolved.  The  hard  problem  of  how  to  determine  the 
exact  number  of  errors  that  have  occurred  has  been  solved  in  [11].  For  error 
correction,  prior  to  [11],  it  was  known  only  how  to  correct  a  weighted 
checksum  scheme  for  the  cases  of  one  error  [9]  and  two  errors  [10].  In  [11]  a 
theoretical  framework  was  given  which  would  enable  one  to  solve  the 
correction  problem  for  the  general  case. 

The  weighted  checksum  technique  has  been  demonstrated  to  be  effective  in 
multiple  error  detection.  It  has  been  shown  that,  in  order  to  guarantee  error 
detection,  the  chosen  weight  vectors  must  satisfy  some  very  specific 
properties  about  linear  independence.  Previously,  appropriate  sets  of  weight 
vectors  have  been  proposed  which  are  powers  of  integers  [9,  12];  these  suffer 
from  the  fact  that  the  weights  can  become  very  large.  In  [13, 14]  a  new  scheme 
was  presented  that  generates  weight  vectors  to  meet  the  requirements  about 
independence  and  to  avoid  the  difficulties  with  overflow. 

PROGRESS 

The  problem  of  linearly  constrained  least  squares  has  many  applications  in 
signal  processing.  In  a  paper  that  will  appear  in  the  journal  Integration,  we 
present  a  perturbation  analysis  of  a  linearly  constrained  least  squares 
algorithm  for  adaptive  beamforming.  The  perturbation  bounds  for  the 
solution  as  well  as  for  the  latest  residual  element  are  derived.  We  also 
propose  an  error  estimation  scheme  for  the  residual  element,  which  can  be 
incorporated  into  a  systolic  array  implementation  of  the  algorithm. 
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A  new  approach  to  the  parallel  solution  of  the  Singular  Value  Decomposition 
(SVD)  has  been  devised  by  us.  The  recursive  and  hierarchical  structure  of  the 
new  ordering  coincides  with  the  hierarchical  connection  network,  in  which 
neighboring  processors  are  more  tightly  coupled  than  processors  far  apart, 
minimizing  the  communication  overhead.  The  fat  tree  was  used  as  a  first 
example  of  a  hierarchical  network.  The  resulting  algorithm  scales  up  nicely 
when  the  processors  are  configured  as  a  two-dimensional  fat-tree.  The  best 
approach  to  parallel  SVD  computation  is  one  of  the  Jacobi  type;  such 
algorithms  have  been  developed  for  hypercube,  linear  array  and  ring 
architectures.  Motivated  by  the  fat  tree  arAitecture,  in  this  research  we  have 
developed  an  implementation  of  a  Jacobi  method  on  the  massivley  parallel 
CM-5.  We  propose  new  Jacobi  orderings  for  efficient  utilization  of  the 
communication  network  and  analyze  its  behavioi.  Our  paper  has  been 
presented  at  an  Army  Conference  on  Applied  Mathematics  and  Computing, 
and  will  appear  in  its  Conference  Proceedings. 

SClENTinC  IMPACT  OF  RESEARCH 

Our  work  is  making  a  significant  impact  in  that  it  is  getting  lots  of  attention 
so  that  several  researchers  are  attempting  to  improve  on  our  work  [15-17]. 
We  are  most  proud  of  our  result  in  discovering  the  relationship  between  the 
famous  Berlekamp-Massey  algorithm  for  decoding  the  Reed-Solomon  code, 
and  the  well  known  Lanczos  algorithm  in  numerical  computing. 

P-EGREE&AWARP..ED 

None 
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Task  #7 

Task  Principal  Investigator  H.  C.  Tomg 

(607)  255-5191 


OBJECnVE 

All  the  emerging  processors,  such  as  the  Pentium  from  Intel-  to  be 
announced  in  the  spring  of  1993,  and  the  IBM  RS6000  "Power"  architecture, 
employ  multiple  functional  units,  issue  and  execute  multiple,  and  possibly 
out-of-order,  instructions  from  an  instruction  stream.  The  objective  of  our 
task  is  to  address  two  important  issues  that  confront  the  development  of 
these  processors:  interrupt  handling  and  branch  processing. 

An  indispensable  requirement  for  modern  processors  is  that  they  handle 
interrupt  promptly  and  precisely.  Interrupt  requests  have  to  be  promptly 
handled  because  tasks  that  initiate  these  requests  have  to  be  processed  as  soon 
as  possible.  Responding  to  an  interrupt  request,  a  processor  first  stores  its 
processor  state;  this  has  to  be  done  precisely  so  that  the  interrupted  process 
can  be  resumed  at  the  very  point  of  interruption. 

In  a  modem  processor  with  multiple  functional  units,  which  may  include 
arithmetic  and  logic  units  (ALU’s),  multipliers,  branch  handlers,  and 
memory  access  units,  multiple  instructions  are  at  various  stage  of  execution 
at  a  given  time  instant;  it  is  extremely  difficult  to  identify  and  store  a  precise 
interrupt  point  with  an  acceptable  latency. 

Realistic  programs  always  contain  conditional  branch  instructions.  It  takes 
time  to  resolve  each  and  every  one  of  these  conditional  branches;  and  there  is 
an  inevitable  delay  to  access  the  target  instruction  once  the  branch  uncertainty 
is  resolved.  These  undesirable  effects  are  even  more  prominent  in  computer 
systems  which  issue  multiple  and  out-of-order  instructions. 

We  seek  solutions  to  these  two  problems,  which  are  important  for  real-time 
signal  processing  systems  as  well  as  for  general  computer  applications. 

DISCUSSION  OF  STATE-OF-THE-ART 

Interrupts  and  exceptions  can  be  classified  into  three  types:  external 
interrupts,  exception  traps,  and  software  traps.  External  interrupts  are 
initiated  by  sensors  and  other  outside  elements;  the  computer  system  being 
interrupted  is  called  upon  to  perform  certain  tasks  of  higher  priority.  When  a 
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processor  encounters  an  abnormality  in  its  processing,  such  as  division  by 
zero,  overflow,  or  illegal  operations,  it  generates  exception  traps.  Software 
traps  are  installed  by  programmers  in  an  instruction  stream  to  initiate 
interrupt  requests;  with  these  pre-planned  interrupts,  programmers  gain 
additional  controls  of  the  processing  of  their  programs. 

In  response  to  an  interrupt  request,  a  processor  saves  its  processor  state,  then 
load  and  execute  an  appropriate  interrupt  handler.  Upon  completion  of  the 
interrupt  handling  routine,  the  processor  restores  the  saved  processor  state 
and  then  restarts  the  interrupted  process. 

A  processor  state  should  contain  only  enough  information  so  that  the 
interrupted  process  can  be  restarted  at  the  precise  point  where  it  was 
interrupted.  To  be  able  to  resume  an  interrupted  process,  the  processor  state 
should  consist  of  the  contents  of  the  general  purpose  registers,  the  program 
counter,  the  condition  registers,  all  index  registers  and  the  relevant  portion  of 
the  main  memory.  A  processor  state  consumes  a  large  amount  of  precious 
memory. 

For  processors  which  issue  at  most  one  instruction  per  machine  cycle,  it  is 
relatively  easier  to  identify  precisely  the  point  where  a  process  is  interrupted. 
Among  other  vital  items,  the  address  of  a  specific  instruction,  say  instruction 
a,  is  saved.  All  instructions  that  precede  instruction  a  have  been  executed. 
And  instruction  a  and  those  that  follow  it  have  not.  Instruction  a  thus 
provides  a  precise  interrupt  point. 

I 

Unfortunately,  modern  processors,  such  as  the  Intel  Pentium  and  IBM  RS 
6000,  may  issue  more  than  one  instruction  per  machine  cycle  and  some  of 
these  instructions  are  issued  out-of-order.  The  identification  of  a  precise 
interrupt  point  becomes  very  difficult.  We  have  addressed  this  task  and  have 
found  a  promising  solution.  A  brief  summary  of  the  state  of  the  art  is 
presented  below: 

The  machines  designed  by  Cray  [1,  2]  have  multiple  functional  units  and  do 
allow  instructions  executed  out-of-order.  In  response  to  interrupt  requests, 
these  machines  allow  instructions  under  execution  to  complete  before  the 
processor  state  is  stored;  this  means  that  they  do  not  respond  to  interrupt 
requests  promptly.  For  machines  designed  for  off-line  scientific  computing, 
this  is  acceptable.  It  is  however  not  desirable  or  even  permissible  for  real-time 
applications.  In  the  IBM  360/91  [31,  a  similar  "long-latency"  approach  was 
adopted  by  allowing  all  issued  instructions  to  complete  their  execution.  The 
IBM  360/91  does  allow  an  "imprecise"  interrupt  wfiich  ignores  the  interrupt 
point;  in  other  words,  the  interrupted  process  can  not  be  restarted  later. 

Checkpointing  has  been  proposed  to  facilitate  precise  interrupt  handling  [4]. 
In  such  a  scheme,  a  minimum  of  two  checkpoints  and  hence  two  additional 
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processor  states  have  to  be  maintained.  Clearly,  the  checkpointing  approach 
proposed  degrades  system  performance,  both  in  the  time  required  to  process  a 
given  task,  and  in  the  time  required  to  restore  to  a  consistent  processor  state 
upon  receiving  an  interrupt  request.  The  processing  time  for  a  task  is 
increased  by  the  the  need  to  update  state  information  as  the  states  change,  and 
by  the  execution  of  additional  read  instructions  which  must  precede  all 
instructions  which  alter  the  memory.  It  should  be  pointed  out  also  that  each 
checkpoint  corresponds  to  a  processor  state;  the  memory  and  other  hardw’are 
required  to  implement  each  and  every  one  of  these  checkpoints  are  extremely 
costly. 

Smith  and  Pleszkun  [5]  presented  several  schemes  to  realize  the  classical 
precise  interrupts.  By  introducing  "re-order  registers",  "history  buffers '  and 
"future  files",  they  make  sure  that  the  instructions  are  completed  "in-order", 
even  though  they  may  be  executed  out-of-order.  In  so  doing,  a  precise 
interrupt  point  can  be  identified  and  saved.  This  is  done  with  unavoidable 
performance  degradations.  We  should  and  can  do  better. 

The  effectiveness  of  processors  with  multiple  functional  units  in  realizing 
performance  gains  is  hindered  to  a  great  extent  by  the  presence  of  branches, 
especially  conditional  branches,  in  system  and  application  codes. 

One  is  tempted  to  learn  to  live  with  a  conditional  branch  by  waiting  patiently 
for  its  resolution.  However,  the  number  of  instructions  that  can  be  examined 
at  run  time  by  the  processor  for  possible  concurrent  issuance  will  be  limited 
by  the  size  of  a  basic  block,  which  is  usually  not  that  large.  We  have  to  deal 
with  conditional  branches  more  aggressively. 

Schemes  have  been  proposed  and  implemented  to  predict  the  outcome  of  a 
conditional  branch  [6,  7],  and  to  fetch  and  execute  instructions  on  the 
predicted  path.  Of  course,  it  should  be  expected  that  we  may  encounter  more 
conditioned  branches  as  we  process  along  the  selected  path.  What  has  been 
gained  is  that  with  branch  prediction  we  have  essentially  enlarged  a  basic 
block  and  made  many  more  instructions  available  for  concurrent  processing. 
There  is  one  caveat;  some  of  the  branch  predictions  are  going  to  be  wrong. 
When  that  happens,  the  processor  has  to  1^  able  to  restore  the  processor  state 
to  the  point  where  the  correct  path  is  to  be  taken.  Not  only  do  we  lose  the 
work  that  has  been  done  on  the  wrongly  predicted  path,  but  also  we  have  to 
take  time  to  restore  the  processor  state. 

In  order  to  make  a  branch  prediction  scheme  worthwhile,  we  have  to  have 
the  following: 

1.  a  good  branch  prediction  mechanism,  which  makes  most  of  the 
predictions  correctly;  and 
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2.  an  efficient  recovery  mechanism,  which  does  not  degrade  the 
processor  performance  in  normal  operation  and  it  restores  the  processor  to  an 
appropriate  state  when  a  prediction  made  turned  out  to  be  incorrect. 

mOGMSS 

We  have  made  considerable  progress  in  the  following  investigations: 

Interrupt  Handling  18.  9, 10.  Ill 

We  have  identified  factors  that  must  be  considered  in^  evaluating  the 
effectiveness  of  interrupt  and  exception  handling  schemes:  latency,  cost,  and 
performance  degradation. 

We  have  formulated  and  evaluated  an  efficient  hardware  mechanism,  the 
Instruction  Window  (IW),  and  a  new  approach,  which  allows  for  precise, 
responsive  and  flexible  interrupt  and  exception  handling. 

We  have  examined  in  detail  the  implementation  of  the  IW.  The  design  of  an 
8-cell  IW  has  been  carried  out.  We  have  established  that  such  an  instruction 
window  can  work  with  a  very  short  machine  cycle  time;  this  is  important 
because  we  do  not  want  the  introduction  of  the  IW  to  reduce  the  processor 
clock  rate. 

We  have  undertaken  a  comparison  of  all  interrupt  and  exception  handling 
schemes  for  out-of-order  execution  processors. 

A  paper  reporting  this  topic  appears  in  the  January  1993  issue  of  the  IEEE 
Trans,  on  Computers.  And  the  patent  application  pending  since  January  1990 
should  be  resolved  soon.  , 

We  have  completed  the  study  of  the  Fast  Dispatch  Stack  (FDS)  system,  which 
provides  another  approach  to  fast,  precise  interrupt  handling.  In  addition  to 
instruction  issuance,  the  FDS  serves  as  a  "re-order"  buffer  for  the  processor; 
this  extends  considerably  the  state  of  the  art  in  this  important  area. 

Branch  Processing  19. 10.  Ill 

We  have  established  that  the  Fast  Dispatch  Stack  (FDS)  can  be  used  to  reduce  the 
penalty  when  a  branch  is  predicted  incorrectly.  Conditional  branch  instructions 
along  with  those  instructions  on  the  predicted  branch  path  are  brought  into  the 
FDS  to  be  executed  speculatively.  The  execution  results  are  written  into  an 
additional  set  of  registers,  called  the  "working  registers"  for  temporary  storage 
and  accesses  by  subsequent  instructions.  These  results  are  copied  into  the 
"architected  registers"  once  the  instructions  are  retired  from  the  FDS. 
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We  now  require  that  a  new  instruction  removal  mode  be  instituted:  only  those 
instructions  at  the  top  of  the  FDS  may  be  retired;  multiple  instructions  are 
retired  at  once  if  they  form  a  contiguous  sequence  of  instructions  at  the  top. 

When  the  prediction  made  for  a  conditional  branch  is  found  to  be  correct,  the 
branch  instruction  can  be  removed  from  the  FDS  when  it  is  a  member  of  a 
contiguous  segment  of  completed  instructions,  including  the  top  one.  Again, 
please  note  that  when  an  instruction  is  retired  from  the  FDS,  its  result  is  made 
permanent  by  being  copied  into  an  architected  register. 

When  the  prediction  made  for  a  conditional  branch  is  found  to  be  incorrect,  the 
branch  instruction  will  not  be  retired  from  the  FDS.  All  instructions  which 
follow  the  branch  are  removed  immediately.  When  the  branch  instruction 
reaches  to  the  top  of  the  FDS,  the  contents  of  the  architected  registers  are  copied 
into  the  working  registers.  The  FDS  is  then  filled  with  instructions  from  the 
correct  path.  In  other  words,  the  correction  required  can  be  accomplished  very 
quickly,  often  in  one  machine  cycle;  this  reduces  the  penalty  for  wrong  guesses 
and  thus  facilitates  the  use  of  branch  prediction  as  a  means  for  performance 
enhancement. 

New  Implementation  Schemes  TlO.  11. 121 

We  have  addressed  the  concerns  that  the  Dispatch  Stack  that  we  have  developed 
may  have  adverse  impact  on  processor  cycle  time,  and  developed  several  new 
schemes:  the  us  *  of  bit  vectors;  the  use  of  pointers;  and  finally  a  block  based 
window.  Details  can  be  found  in  the  cited  publications. 

SaENTinC  IMPACT  OF  RESEARCH 

We  have  developed  solutions  to  several  outstanding  problems  in  the 
development  of  modern  processors,  which  can  make  the  best  use  of  the 
advancing  device  technologies.  We  have  made  considerable  progress  in 
interrupt  handling,  branch  processing  and  dispatch  stack  implementation. 
These  advances  enhance  significantly  the  performances  of  superscalars 
processors  without  raising  the  clock  rate,  and  provides  timely  and  much 
needed  solutions  to  outstanding  problems.  We  continue  to  have  ongoing 
discussions  with  IBM  and  Intel. 
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